Closed
Bug 806494
Opened 12 years ago
Closed 12 years ago
The first sync after startup always fails
Categories
(Firefox :: Sync, defect, P1)
Firefox
Sync
Tracking
()
VERIFIED
FIXED
mozilla20
People
(Reporter: kamil.paral, Assigned: rnewman)
References
Details
Attachments
(9 files, 1 obsolete file)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0
Build ID: 20121026113527
Steps to reproduce:
I use Sync to synchronize Firefox data between home computer and office computer (both Fedora 17 at the moment). On the home computer everything works perfect. But on the office computer, every time I start Firefox (after system reboot, resume, or long period of inactivity) the initial sync fails ("bad username or password" error). If I manually trigger another sync, it sometimes work, sometimes it still fails ("unknown reason" error). Third sync usually works correctly.
I should note that on the office computer I run two instances of Firefox separated by --profile and --no-remote command line options. When I start my "home" Firefox, the "company" Firefox is already running on that machine. That might be relevant.
This has been happening for many months, very regularly (I see it literally all the time).
Reporter | ||
Comment 1•12 years ago
|
||
1351534432702 Sync.Service INFO Logging in user vi4usibtxmx2cgoutp5piyzw2uqtlbod
1351534432702 Sync.Service DEBUG Caching URLs under storage user base: https://scl2-sync4.services.mozilla.com/1.1/vi4usibtxmx2cgoutp5piyzw2uqtlbod/
1351534445602 Sync.Resource DEBUG mesg: GET fail 401 https://scl2-sync4.services.mozilla.com/1.1/vi4usibtxmx2cgoutp5piyzw2uqtlbod/info/collections
1351534445602 Sync.Resource DEBUG GET fail 401 https://scl2-sync4.services.mozilla.com/1.1/vi4usibtxmx2cgoutp5piyzw2uqtlbod/info/collections
1351534445603 Sync.Service WARN 401: login failed.
1351534445603 Sync.Service DEBUG Finding cluster for user vi4usibtxmx2cgoutp5piyzw2uqtlbod
1351534446587 Sync.Resource DEBUG mesg: GET success 200 https://auth.services.mozilla.com/user/1.0/vi4usibtxmx2cgoutp5piyzw2uqtlbod/node/weave
1351534446587 Sync.Resource DEBUG GET success 200 https://auth.services.mozilla.com/user/1.0/vi4usibtxmx2cgoutp5piyzw2uqtlbod/node/weave
1351534446587 Sync.Service DEBUG Cluster value = https://scl2-sync4.services.mozilla.com/
1351534446587 Sync.Status DEBUG Status.login: success.login => error.login.reason.account
1351534446587 Sync.Status DEBUG Status.service: success.status_ok => error.login.failed
Reporter | ||
Comment 2•12 years ago
|
||
1351534576224 Sync.Collection WARN Got exception calling onProgress handler during fetch of https://scl2-sync4.services.mozilla.com/1.1/vi4usibtxmx2cgoutp5piyzw2uqtlbod/storage/history?newer=1351526738.82&full=1
1351534576225 Sync.Collection DEBUG JSON.parse: unexpected character JS Stack trace: deserialize@record.js:65 < @record.js:607 < Channel_onDataAvail@resource.js:520
1351534576226 Sync.Status DEBUG Status for engine history: error.engine.reason.unknown_fail
1351534576226 Sync.Status DEBUG Status.service: success.status_ok => error.sync.failed_partial
1351534576226 Sync.ErrorHandler DEBUG history failed: NS_ERROR_XPC_JS_THREW_JS_OBJECT JS Stack trace: Res_get@resource.js:398 < SyncEngine__processIncoming@engines.js:873 < SyncEngine__sync@engines.js:1284 < WrappedNotify@util.js:142 < Engine_sync@engines.js:481 < WeaveSvc__syncEngine@service.js:1393 < @service.js:1286 < WrappedNotify@util.js:142 < WrappedLock@util.js:97 < _lockedSync@service.js:1186 < @service.js:1177 < WrappedCatch@util.js:71 < sync@service.js:1165
Reporter | ||
Comment 3•12 years ago
|
||
1351534672315 Sync.Resource DEBUG mesg: GET fail 401 https://scl2-sync4.services.mozilla.com/1.1/vi4usibtxmx2cgoutp5piyzw2uqtlbod/info/collections
1351534672315 Sync.Resource DEBUG GET fail 401 https://scl2-sync4.services.mozilla.com/1.1/vi4usibtxmx2cgoutp5piyzw2uqtlbod/info/collections
Reporter | ||
Comment 4•12 years ago
|
||
Reporter | ||
Comment 5•12 years ago
|
||
1351534795951 Sync.Collection WARN Got exception calling onProgress handler during fetch of https://scl2-sync4.services.mozilla.com/1.1/vi4usibtxmx2cgoutp5piyzw2uqtlbod/storage/tabs?full=1
1351534795952 Sync.Collection DEBUG JSON.parse: unexpected character JS Stack trace: deserialize@record.js:65 < @record.js:607 < Channel_onDataAvail@resource.js:520
1351534795952 Sync.Status DEBUG Status for engine tabs: error.engine.reason.unknown_fail
1351534795952 Sync.Status DEBUG Status.service: success.status_ok => error.sync.failed_partial
1351534795953 Sync.ErrorHandler DEBUG tabs failed: NS_ERROR_XPC_JS_THREW_JS_OBJECT JS Stack trace: Res_get@resource.js:398 < SyncEngine__processIncoming@engines.js:873 < SyncEngine__sync@engines.js:1284 < WrappedNotify@util.js:142 < Engine_sync@engines.js:481 < WeaveSvc__syncEngine@service.js:1393 < @service.js:1286 < WrappedNotify@util.js:142 < WrappedLock@util.js:97 < _lockedSync@service.js:1186 < @service.js:1177 < WrappedCatch@util.js:71 < sync@service.js:1165
1351534795953 Sync.Service INFO Sync completed at 2012-10-29 19:19:55 after 21.95 secs.
1351534795953 Sync.SyncScheduler DEBUG Next sync in 90000 ms.
1351534795953 Sync.ErrorHandler DEBUG Some engines did not sync correctly.
Reporter | ||
Comment 6•12 years ago
|
||
Trying to change Product and Component to Firefox Sync, I hope this is correct.
Component: Untriaged → Firefox Sync: Backend
Product: Firefox → Mozilla Services
Version: 16 Branch → unspecified
Comment 7•12 years ago
|
||
I experience similar problems on my Mac [806494]
https://bugzilla.mozilla.org/show_bug.cgi?id=806494
Comment 8•12 years ago
|
||
That should be https://bugzilla.mozilla.org/show_bug.cgi?id=807074
(cannot edit comment)
Comment 9•12 years ago
|
||
Lots of people are having this issue. We're working on it. There's some discussion in bug 805386.
Assignee | ||
Comment 10•12 years ago
|
||
OK, there are two bugs here. One is Bug 805693, and the other is Bug 794652.
I'm going to treat this as the latter.
Kamil, are you still seeing an issue with the current version of Firefox?
No longer depends on: 805693
Flags: needinfo?(kamil.paral)
Reporter | ||
Comment 11•12 years ago
|
||
Hello, Richard, yes I do. Firefox 17.0.1. It might be a bit less frequent, but I still see it very often.
Do you need a fresh set of logs?
Flags: needinfo?(kamil.paral)
Assignee | ||
Comment 12•12 years ago
|
||
(In reply to Kamil Páral from comment #11)
> Hello, Richard, yes I do. Firefox 17.0.1. It might be a bit less frequent,
> but I still see it very often.
>
> Do you need a fresh set of logs?
Yes please!
Reporter | ||
Comment 13•12 years ago
|
||
Ironically I can't trigger it right now, but I attach last tree error logs I see in about:sync-log.
Reporter | ||
Comment 14•12 years ago
|
||
Reporter | ||
Comment 15•12 years ago
|
||
Assignee | ||
Comment 16•12 years ago
|
||
Excellent, thanks for the logs.
You seem to be being bitten by Bug 798430, which is supposed to be fixed in Firefox 17. Evidently that's not the case, so let's look at that.
Priority: -- → P1
Assignee | ||
Comment 17•12 years ago
|
||
Kamil, would you be able to provide me with HTTP logs, as described here?
https://developer.mozilla.org/en-US/docs/HTTP_Logging
Also set
services.sync.log.logger.network.resources
to "Trace" (including capital "T").
Enable HTTP logging, then restart Firefox and trigger a sync with Tools > Sync Now. If you get a yellow bar, great; please email me the new error log and your HTTP log.
You can then set that config value back to "Debug", and restart Firefox.
With any luck, this will provide enough information to figure out what's going on…
Thanks!
Flags: needinfo?(kamil.paral)
Reporter | ||
Comment 19•12 years ago
|
||
I've emailed you the logs, Richard. It seems to me that setting services.sync.log.logger.network.resources to Trace triggers the error much more often than Debug. I've hit it on the first attempt. But maybe it's just coincidence.
Flags: needinfo?(kamil.paral)
Assignee | ||
Comment 20•12 years ago
|
||
09:28:21.098549 617957184[590]: Creating HttpBaseChannel @f3d8000
09:28:21.098552 617957184[590]: Creating nsHttpChannel [this=7f2b0f3d8000]
09:28:21.098558 617957184[590]: HttpBaseChannel::Init [this=7f2b0f3d8000]
09:28:21.098563 617957184[590]: host=scl2-sync4.services.mozilla.com port=-1
09:28:21.098566 617957184[590]: uri=https://scl2-sync4.services.mozilla.com/1.1/USERNAME/info/collections
09:28:21.098568 617957184[590]: Creating nsHttpConnectionInfo @bab1d40
09:28:21.098577 617957184[590]: nsHttpChannel::Init [this=7f2b0f3d8000]
09:28:21.098583 617957184[590]: nsHttpChannel::SetupReplacementChannel [this=7f2b0efcb000 newChannel=7f2b0f3d8050 preserveMethod=1]
09:28:21.098586 617957184[590]: HttpBaseChannel::SetupReplacementChannel [this=7f2b0efcb000 newChannel=7f2b0f3d8050 preserveMethod=1]
09:28:21.098657 617957184[590]: HttpBaseChannel::SetApplyConversion [this=7f2b0f3d8000 value=1]
09:28:21.098665 617957184[590]: nsHttpChannel::WaitForRedirectCallback [this=7f2b0efcb000]
09:28:21.098672 617957184[590]: nsHttpChannel::CloseCacheEntry [this=7f2b095f8800] mStatus=804b001e mCacheAccess=2
09:28:21.098675 617957184[590]: dooming cache entry!!
09:28:21.099034 617957184[590]: nsHttpChannel::OnRedirectVerifyCallback [this=7f2b0efcb000] result=0 stack=2 mWaitingForRedirectCallback=1
09:28:21.099041 617957184[590]: nsHttpChannel::AsyncOpen [this=7f2b0f3d8000]
09:28:21.099048 617957184[590]: Resolving host [scl2-sync4.services.mozilla.com].
09:28:21.099070 617957184[590]: HttpBaseChannel::SetRequestHeader [this=7f2b0f3d8000 header="Cookie" value="" merge=0]
09:28:21.099074 617957184[590]: nsHttpChannelAuthProvider::AddAuthorizationHeaders? [this=7f2b0b81f6e0 channel=7f2b0f3d82d0]
09:28:21.099079 617957184[590]: nsHttpAuthCache::GetAuthEntryForPath [key=https://scl2-sync4.services.mozilla.com:-1 path=/1.1/USERNAME/info/]
09:28:21.099084 617957184[590]: nsHttpHandler::NotifyObservers [chan=f3d8050 event="http-on-modify-request"]
09:28:21.099156 617957184[590]: nsHttpChannel::Connect [this=7f2b0f3d8000]
09:28:21.099165 617957184[590]: nsHttpConnectionMgr::SpeculativeConnect [ci=.S.scl2-sync4.services.mozilla.com:443]
09:28:21.099170 617957184[590]: STS dispatch [7f2b0f20a6c0]
09:28:21.099178 617957184[590]: nsHttpChannel::OpenCacheEntry [this=7f2b0f3d8000]
09:28:21.099179 442496768[ae0]: ...returned after 2 milliseconds
09:28:21.099182 617957184[590]: OpenCacheEntry failed [rv=80040111]
09:28:21.099186 617957184[590]: nsHttpChannel::SetupTransaction [this=7f2b0f3d8000]
09:28:21.099186 442496768[ae0]: nsHttpConnectionMgr::OnMsgSpeculativeConnect [ci=.S.scl2-sync4.services.mozilla.com:443]
09:28:21.099192 442496768[ae0]: Creating nsHttpConnectionInfo @f27ecc0
09:28:21.099193 617957184[590]: Creating nsHttpTransaction @86c1650
09:28:21.099197 617957184[590]: nsHttpTransaction::Init [this=86c1650 caps=9]
09:28:21.099198 442496768[ae0]: nsHttpConnectionMgr::AtActiveConnectionLimit [ci=.S.scl2-sync4.services.mozilla.com:443 caps=1]
09:28:21.099201 442496768[ae0]: connection count = 0, limit 6
09:28:21.099204 442496768[ae0]: result: false
09:28:21.099206 617957184[590]: http request [
09:28:21.099207 442496768[ae0]: Creating nsHalfOpenSocket [this=7f2b0b8add30 trans=7f2b0b8670b0 ent=scl2-sync4.services.mozilla.com]
09:28:21.099210 617957184[590]: GET /1.1/USERNAME/info/collections HTTP/1.1
09:28:21.099213 442496768[ae0]: creating nsSocketTransport @71a10a0
09:28:21.099214 617957184[590]: Host: scl2-sync4.services.mozilla.com
09:28:21.099219 442496768[ae0]: nsSocketTransport::Init [this=71a10a0 host=scl2-sync4.services.mozilla.com:443 proxy=:0]
09:28:21.099222 617957184[590]: User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20100101 Firefox/17.0
09:28:21.099226 617957184[590]: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
09:28:21.099227 442496768[ae0]: Reset callbacks for secinfo=0 callbacks=7f2b0b8add40
09:28:21.099229 617957184[590]: Accept-Language: cs,en-us;q=0.7,en;q=0.3
09:28:21.099230 442496768[ae0]: nsSocketTransport::OpenOutputStream [this=71a10a0 flags=2]
09:28:21.099232 617957184[590]: Accept-Encoding: gzip, deflate
09:28:21.099233 442496768[ae0]: nsSocketTransport::PostEvent [this=7f2b071a10a0 type=0 status=0 param=0]
09:28:21.099235 617957184[590]: Connection: keep-alive
09:28:21.099237 442496768[ae0]: STS dispatch [7f2b0b1a9df0]
09:28:21.099239 617957184[590]: Pragma: no-cache
09:28:21.099242 617957184[590]: Cache-Control: no-cache
To me, this looks like an internal redirect occurs, then auth headers are set. Honza et al: does this (a) show any evidence of custom headers being set in the async redirect handler, or (b) appear to show necko setting auth and cookie headers regardless of what happens inside the redirect handler?
Flags: needinfo?(honzab.moz)
Assignee | ||
Comment 21•12 years ago
|
||
Added some logging:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b7e8f82a341
https://hg.mozilla.org/services/services-central/rev/5c5e1e93e3a9
Assignee: nobody → rnewman
Status: NEW → ASSIGNED
Whiteboard: [leave open]
Comment 22•12 years ago
|
||
(In reply to Richard Newman [:rnewman] from comment #20)
> To me, this looks like an internal redirect occurs, then auth headers are
> set. Honza et al: does this (a) show any evidence of custom headers being
> set in the async redirect handler, or (b) appear to show necko setting auth
> and cookie headers regardless of what happens inside the redirect handler?
Do you have a complete log?
Flags: needinfo?(honzab.moz)
Assignee | ||
Comment 23•12 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #22)
> Do you have a complete log?
Sent to your m.c address.
Comment 25•12 years ago
|
||
Comment 26•12 years ago
|
||
(In reply to Ryan VanderMeulen from comment #25)
> https://hg.mozilla.org/mozilla-central/rev/4b7e8f82a341
Thanks for doing this. I'm going to contact Kamil directly and let him try this build.
I've modified locally one of our xpcshell necko tests to check whether copying "Authorization" header when we go the AsyncDoReplaceWithProxy path works. Header is copied correctly.
Reporter | ||
Comment 28•12 years ago
|
||
I was unable to reproduce the problem using FF Nightly. Maybe it was fixed in the mean time? I provided Honza Bambas with the success log, maybe you can find some details there.
Assignee | ||
Comment 29•12 years ago
|
||
Comment 30•12 years ago
|
||
(In reply to Kamil Páral from comment #28)
> I was unable to reproduce the problem using FF Nightly. Maybe it was fixed
> in the mean time? I provided Honza Bambas with the success log, maybe you
> can find some details there.
After looking at the logs again I can say:
In the old (faulty) log I can see we call SetRequestHeader for "user-agent" and "authorization" only on the *old* channel. But not on the *new* channel, so the code to copy headers is apparently faulty ("authorization" is not in the array of headers to copy? Could case sensitivity be involved? I was trying to understand the piece of code around headers but it was very complicated).
In the new (success) log I don't see any redirect, so hard to say.
Assignee | ||
Comment 31•12 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #30)
> In the old (faulty) log I can see we call SetRequestHeader for "user-agent"
> and "authorization" only on the *old* channel. But not on the *new*
> channel, so the code to copy headers is apparently faulty ("authorization"
> is not in the array of headers to copy? Could case sensitivity be involved?
> I was trying to understand the piece of code around headers but it was very
> complicated).
I *hope* case-sensitivity isn't involved -- we originally set the headers using the same strings that we then use for lookup when copying.
1. AsyncResource.setHeader writes into this._headers with the lowercased name.
2. AsyncResource._createRequest walks those headers (plus the ones from the authenticator), calling channel.setRequestHeader(key, value, false). It also creates and registers ChannelNotificationListener with the header names that it used.
3. When that listener is passed two channels, it copies the named headers from the old to the new.
We know that code gets to run in Kamil's case:
1355822901098 Sync.Resource TRACE Copying headers for safe internal redirect.
but until that last logging patch, we didn't get to see if individual headers were actually copied. Now that Kamil can't repro, we're out of luck.
But this is part of a broader issue: I have no way to test this code in the context of a redirect, because as far as I can tell there is no way from JS (or maybe even from C++ until Bug 765934 lands) to cause an internal redirect to occur. The only way we *were* able to do it -- setting a PAC file -- disappeared a few releases ago.
So this code is essentially untested, and will remain so until there's some way to test internal redirect handlers. :/
How likely is it that Necko will add a channel flag that forces an internal redirect to allow testing?
Reporter | ||
Comment 32•12 years ago
|
||
I have tried Nightly just a few times and the problem didn't occur. Unfortunately with FF 17 I don't see it consistently either, the error happened yesterday but not today. I'll try to use Nightly full-time for a week and I'll report back whether I'll see anything suspicious.
Comment 33•12 years ago
|
||
I can see two reasons this doesn't work:
- copy header code doesn't work
- auth cache is involved in clearing or overriding the header somehow, however it seems not to be likely according Kamil's faulty logs
(In reply to Richard Newman [:rnewman] from comment #31)
> But this is part of a broader issue: I have no way to test this code in the
> context of a redirect, because as far as I can tell there is no way from JS
> (or maybe even from C++ until Bug 765934 lands) to cause an internal
> redirect to occur. The only way we *were* able to do it -- setting a PAC
> file -- disappeared a few releases ago.
Not true. test_proxy-failover_passing.js is an example how to invoke an internal redirect when proxy is failing. It is using PAC.
I did some modifications to it to check we don't drop Authorization header on internal redirect, no luck to reproduce this issue, but you can get inspired to build a test.
>
> So this code is essentially untested, and will remain so until there's some
> way to test internal redirect handlers. :/
>
> How likely is it that Necko will add a channel flag that forces an internal
> redirect to allow testing?
You can apply the patch from bug 765934 to try.
Assignee | ||
Comment 34•12 years ago
|
||
Well, that'll do it:
Error copying headers: newChannel.setRequestHeader is not a function JS Stack trace: asyncOnChannelRedirect@resource.js:639
Assignee | ||
Comment 35•12 years ago
|
||
Thanks for the test pointer, Honza. I managed to write a test, and found that the problem was a failure to QI the new channel, so we failed to set the header. With the additional logging I added earlier in this bug, the failure was obvious once reproduced.
Attachment #698221 -
Flags: review?(gps)
Assignee | ||
Comment 36•12 years ago
|
||
Slightly tidier version.
Note that rest.js already QIs appropriately, and correctly includes the third argument to setRequestHeader, and thus is not affected by this bug. If we had less legacy code, we'd have fewer problems.
Attachment #698221 -
Attachment is obsolete: true
Attachment #698221 -
Flags: review?(gps)
Attachment #698222 -
Flags: review?(gps)
Comment 37•12 years ago
|
||
Comment on attachment 698222 [details] [diff] [review]
Proposed patch. v2
Review of attachment 698222 [details] [diff] [review]:
-----------------------------------------------------------------
Please land on inbound so this makes 20.
Attachment #698222 -
Flags: review?(gps) → review+
Assignee | ||
Comment 38•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/08570379483e
If this stays green and cheerful, I will request uplift to beta next week.
OS: Linux → All
Hardware: x86_64 → All
Whiteboard: [leave open]
Target Milestone: --- → mozilla20
Comment 39•12 years ago
|
||
And how is it that getRequestHeader works on the old channel that is not QI'ed to nsIHttpChannel?
Comment 40•12 years ago
|
||
And by the way, if I'm not wrong, we then never was able to copy the headers, so how that could fix the 401 uptick?
And even more, how is that the error log is not visible in the faulty logs from Kamil? Could it be that _log.error() doesn't work for some reason on release branches?
Assignee | ||
Comment 41•12 years ago
|
||
> And how is it that getRequestHeader works on the old channel that is not
> QI'ed to nsIHttpChannel?
Presumably because it's the same object that we already QIed when we set the headers in the first place. newChannel is the only new object.
> And by the way, if I'm not wrong, we then never was able to copy the
> headers, so how that could fix the 401 uptick?
It didn't fix the uptick, but the numbers weren't conclusive enough to tell. (Some 401s are acceptable, and the problem is apparently not easy to repro in the wild.) So it was "wait and see"… and we were lucky enough to get one log that confirmed that the bug was still present, after several users reported that fx17 fixed the problem for them.
> And even more, how is that the error log is not visible in the faulty logs
> from Kamil? Could it be that _log.error() doesn't work for some reason on
> release branches?
His error log was produced before that additional logging code landed, and now he can't repro. Error logs work just fine in all branches -- I rely on them a lot :D
Comment 42•12 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 43•12 years ago
|
||
I wasn't able to reliably reproduce this to begin with. Reporter, can you verify this is fixed on Nightly builds?
Reporter | ||
Comment 44•12 years ago
|
||
I have been using Nightly since 2013-01-03 (i.e. Nightly not containing the fix). I haven't seen a single error log since that time.
Reporter | ||
Comment 45•12 years ago
|
||
It is a week now that I've been using Nightly and I haven't yet seen any error.log. That means I can't confirm whether the commit resolved the issue or not, maybe something else resolved it in the past. But the good news is that with Nightly this is no longer a problem.
Comment 46•12 years ago
|
||
Kamil, That's great news. Thank you for following up. I'm going to verify this per your comments.
Status: RESOLVED → VERIFIED
Updated•6 years ago
|
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in
before you can comment on or make changes to this bug.
Description
•