Closed Bug 704539 Opened 13 years ago Closed 12 years ago

channel.URI is undefined in Sync

Categories

(Firefox :: Sync, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla11
Tracking Status
firefox10 + ---
firefox11 + fixed
firefox12 --- fixed

People

(Reporter: aelilea, Assigned: gps)

References

Details

(Whiteboard: [fixed in services])

Attachments

(2 files, 5 obsolete files)

User Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0a2) Gecko/20111121 Firefox/10.0a2
Build ID: 20111121042024



Actual results:

Received error messages "Sync encountered an error while syncing: Unknown error". 
Looking at the sync logs, the error is always in the tab sync part (it has happened three times). The attached log is typical.

NB This is on the latest Aurora.
Still happening.

1322860492810	Sync.Collection	WARN	Caught unexpected exception channel.URI is undefined JS Stack trace: _onComplete(null,null,[object XPCWrappedNative_NoHelper])@resource.js:334 < innerBind(null,null,[object XPCWrappedNative_NoHelper])@util.js:1071 < Channel_onStopRequest([object XPCWrappedNative_NoHelper],null,0)@resource.js:563 in _onComplete.
1322860492810	Sync.Collection	DEBUG	JS Stack trace: _onComplete(null,null,[object XPCWrappedNative_NoHelper])@resource.js:334 < innerBind(null,null,[object XPCWrappedNative_NoHelper])@util.js:1071 < Channel_onStopRequest([object XPCWrappedNative_NoHelper],null,0)@resource.js:563
1322860492810	Sync.Collection	DEBUG	Caught exception channel.visitResponseHeaders is not a function JS Stack trace: _onComplete(null,null,[object XPCWrappedNative_NoHelper])@resource.js:361 < innerBind(null,null,[object XPCWrappedNative_NoHelper])@util.js:1071 < Channel_onStopRequest([object XPCWrappedNative_NoHelper],null,0)@resource.js:563 visiting headers in _onComplete.
1322860492810	Sync.Collection	DEBUG	JS Stack trace: _onComplete(null,null,[object XPCWrappedNative_NoHelper])@resource.js:361 < innerBind(null,null,[object XPCWrappedNative_NoHelper])@util.js:1071 < Channel_onStopRequest([object XPCWrappedNative_NoHelper],null,0)@resource.js:563
1322860492811	Sync.Status	DEBUG	Status for engine tabs: error.engine.reason.record_download_fail
1322860492811	Sync.Status	DEBUG	Status.service: success.status_ok => error.sync.failed_partial
1322860492811	Sync.ErrorHandler	DEBUG	tabs failed: null No traceback available
13
Thanks for the report. I'll be investigating this when I get some time… sorry for the delay.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Error on tab sync → channel.URI is undefined in tab sync
Flagging this as P2 ("system reliability fix"), because it seems to be a reliable failure for the people it affects, and it seems to affect nightlies on down.
Severity: normal → major
Priority: -- → P2
I know nothing about computers I only check my email very seldom do I send any out.
I have other emails am I supposed to let you guys know them so you can find out where all the bugs are coming from?
Tracking for Firefox 10 and 11 since we've now gotten a few reports of this error.
Summary: channel.URI is undefined in tab sync → channel.URI is undefined in Sync
Hey Richard - sending over to you since I think you're heading up the investigation, and we don't want any tracked&unassigned bugs for Beta 10. Please take a look soon, or let us know if you don't think this warrants fixing for 10.
Assignee: nobody → rnewman
(In reply to Alex Keybl [:akeybl] from comment #9)
> Hey Richard - sending over to you since I think you're heading up the
> investigation, and we don't want any tracked&unassigned bugs for Beta 10.

Thanks!

> Please take a look soon, or let us know if you don't think this warrants
> fixing for 10.

I would love to have it fixed for 10, but I have neither any leads nor any free time to chase it down. Android Sync has really taken over my entire life :/

And so: over to you, gps! I think you've got some free cycles for a week… let's see if you can reproduce and fix this as a break from reconciling :D

This bug smells like it's related to Bug 690170, and the further work that I'd like to do to tidy that up (see comments in that bug). Follow that trail and see where it leads?
Assignee: rnewman → gps
Attached patch Handle channel redirects, v1 (obsolete) — Splinter Review
Presumably the issues are due to the underlying channel being redirected. This patch updates both HTTP channel producers to handle nsIChannelEventSink.asyncOnChannelRedirect().

I also addressed some lingering comments from bug 690170.

This patch is *not* up for review because it doesn't include new tests. However, existing xpcshell tests all pass.

In related news, this is the first time I've mucked around with nsI-level HTTP code. So, I wouldn't be surprised if I were missing something obvious.
Attachment #586497 - Flags: feedback?(rnewman)
Attachment #586497 - Flags: feedback?(bzbarsky)
Comment on attachment 586497 [details] [diff] [review]
Handle channel redirects, v1

Seems fine to me.
Attachment #586497 - Flags: feedback?(bzbarsky) → feedback+
Attachment #586497 - Flags: feedback?(rnewman) → feedback+
Attached patch Better channel handling, v2 (obsolete) — Splinter Review
Changes:

* try..catch around channel.QueryInterface, just in case
* always channel.QueryInterface(Ci.nsIHttpChannel) on xpcom handlers because the QI might have been lost
* install nsIChannelEventSink.asyncOnChannelRedirect() on channel listener. For RESTRequest, update this.channel through it.

This includes new tests. However, I wasn't actually able to reproduce the failures, so saying "all the tests pass" doesn't mean as much as I would like it to. That said, all the tests pass.

The only thing I'm not certain about is whether I'm doing the right thing in RESTRequest.onStartRequest and onStopRequest if the QI fails. I experimented with calling onComplete() with an error, but it broke some random tests. I didn't want to go down the rabbit hole. If you tell me what I should do if the QI fails, I'll code it up and change existing tests (but I would prefer not to, since things like J-PAKE were failing and I'd rather not bloat scope).
Attachment #586497 - Attachment is obsolete: true
Attachment #587213 - Flags: review?(rnewman)
Attachment #587213 - Flags: feedback?(bzbarsky)
Comment on attachment 587213 [details] [diff] [review]
Better channel handling, v2

One option in onStartRequest is to just cancel the channel with some error code if it's not what you expect.

That will then call your onStopRequest and all that.  Note that the attached code likely calls _onComplete multiple times in those failure conditions...

I doubt this needs more input from me, in any case.  ;)
Attachment #587213 - Flags: feedback?(bzbarsky)
Attached patch Better channel handling, v3 (obsolete) — Splinter Review
Now calling channel.cancel() if we can't QI, per feedback.
Attachment #587213 - Attachment is obsolete: true
Attachment #587213 - Flags: review?(rnewman)
Attachment #587473 - Flags: review?(rnewman)
Comment on attachment 587473 [details] [diff] [review]
Better channel handling, v3

Review of attachment 587473 [details] [diff] [review]:
-----------------------------------------------------------------

Requesting final skim feedback, because this is a little close to voodoo for someone not intimately familiar with the edge cases. Otherwise, this looks good to me. Looking forward to seeing a green s-c run :)
Attachment #587473 - Flags: review?(rnewman)
Attachment #587473 - Flags: review?(bzbarsky)
Attachment #587473 - Flags: review+
Comment on attachment 587473 [details] [diff] [review]
Better channel handling, v3

This still looks fine, still modulo the multiple calls to _onComplete.  I have no idea whether those are ok.  Please check with someone who knows that code?
Attachment #587473 - Flags: review?(bzbarsky) → review+
Attached patch Better channel handling, v4 (obsolete) — Splinter Review
This sets this._onComplete = this._onProgress = null in resource.js. Pretty sure this addresses the lingering concerns. All tests pass.
Attachment #587473 - Attachment is obsolete: true
Attachment #588542 - Flags: review?(rnewman)
Comment on attachment 588542 [details] [diff] [review]
Better channel handling, v4

Review of attachment 588542 [details] [diff] [review]:
-----------------------------------------------------------------

That'll do, but perhaps check for null and log prior to invoking onFoo?
Attachment #588542 - Flags: review?(rnewman) → review+
Attached patch Better channel handling, v5 (obsolete) — Splinter Review
Guard onFoo() calls.
Attachment #588542 - Attachment is obsolete: true
Attachment #589239 - Flags: review?(rnewman)
Comment on attachment 589239 [details] [diff] [review]
Better channel handling, v5

Review of attachment 589239 [details] [diff] [review]:
-----------------------------------------------------------------

Just switch warns to errors.

::: services/sync/modules/resource.js
@@ +627,5 @@
>      let error = Components.Exception("Aborting due to channel inactivity.",
>                                       Cr.NS_ERROR_NET_TIMEOUT);
> +    if (!this._onComplete) {
> +      this._log.warn("Unexpected error: _onComplete not defined in " +
> +                     "abortRequest.");

this._log.error("_onComplete not defined in abortRequest.");

::: services/sync/modules/rest.js
@@ +366,5 @@
>      let error = Components.Exception("Aborting due to channel inactivity.",
>                                       Cr.NS_ERROR_NET_TIMEOUT);
> +    if (!this.onComplete) {
> +      this._log.warn("Unexpected error: onComplete not defined in " +
> +                     "abortTimeout.")

As above.
Attachment #589239 - Flags: review?(rnewman) → review+
Pushed to s-c: https://hg.mozilla.org/services/services-central/rev/3d526c2ac8e8
Status: NEW → ASSIGNED
Whiteboard: [fixed in services]
Merged into m-c: https://hg.mozilla.org/mozilla-central/rev/3d526c2ac8e8
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
This is tracking for 10 and 11. However, I'm not sure how to proceed. I'm kinda nervous about asking for 10 approval given the short time before it is released. But, if this is a regression (it isn't apparent from looking at the bug fields), then I suppose 10 uplift should be considered.

rnewman/mconnor: what are your thoughts?
(In reply to Gregory Szorc [:gps] from comment #25)

> rnewman/mconnor: what are your thoughts?

It's not really a huge change, it's had extensive review, it's got some test coverage, and it's been QAed. I suggest a day or two baking on m-c, request approval for Aurora, and if nothing awful happens ask for Beta.

We did this work because we were tracking an Fx10 issue, and it's a fairly compact change that addresses probably our most frequent source of user bug reports.
verified with m-c build of 20120118

Reporter, please confirm you no longer see this.  At the moment, the fix is only on Nightly.
Status: RESOLVED → VERIFIED
This is the actual patch that landed in m-c.

[Approval Request Comment]
Regression caused by (bug #): The issue has apparently been present for a long time.
User impact if declined: Continued Sync errors for some users (not quantified)
Testing completed (on m-c, etc.): Landed on m-c a few days ago. No new bugs filed.
Risk to taking this patch (and alternatives if risky): Only 2 weeks left until uplift.

I'm requesting approval to Aurora and Beta because this bug is tracking for them. However, I'm personally worried about pushing to Beta given the limited time to test before it goes to stable. As stated in previous comments, the bug has undergone review by multiple people, so it *should* be safe. I guess the release drivers have the last word.
Attachment #589239 - Attachment is obsolete: true
Attachment #589910 - Flags: review+
Attachment #589910 - Flags: approval-mozilla-beta?
Attachment #589910 - Flags: approval-mozilla-aurora?
(In reply to Tracy Walker [:tracy] from comment #27)
> Reporter, please confirm you no longer see this.  At the moment, the fix is
> only on Nightly.

Since the bug is not reproducible on demand (it is intermittent) and I don't want to risk my main profile on nightly, I will report after it has landed in Aurora whether there are any occurrences within (say) a week.

Thanks for fixing this!
Comment on attachment 589910 [details] [diff] [review]
Better channel handling, v6

[Triage Comment]
Thanks for the risk evaluation gps. Since we're hesitant about the fix, and the volume of reports has dropped since this was first tracked, we can let this ride the train.
Attachment #589910 - Flags: approval-mozilla-beta?
Attachment #589910 - Flags: approval-mozilla-beta-
Attachment #589910 - Flags: approval-mozilla-aurora?
Attachment #589910 - Flags: approval-mozilla-aurora-
Depends on: responsestatus
I see there have been a number of dupes for this bug now - do you feel that these reports warrant uplifting to Aurora 11, gps?
(In reply to Alex Keybl [:akeybl] from comment #41)
> I see there have been a number of dupes for this bug now - do you feel that
> these reports warrant uplifting to Aurora 11, gps?

Thumbs up from me, FWIW. It looks pretty apparent that my earlier partial fix didn't address the whole issue for all users. I'll let gps give his own perspective, of course.
(In reply to Alex Keybl [:akeybl] from comment #41)
> I see there have been a number of dupes for this bug now - do you feel that
> these reports warrant uplifting to Aurora 11, gps?

I wouldn't have originally requested approval if I didn't think this bug warranted uplift. So, yes, I would like to see this uplifted to whatever version(s) the release drivers find appropriate. Furthermore, if it isn't too late to push out a new Aurora version before 11 goes to Beta, I would like to request that (both for this and bug 714202).
Comment on attachment 589910 [details] [diff] [review]
Better channel handling, v6

[Triage Comment]
Originally denied for Aurora due to low volume of duplicate reports. That has since changed, so approving for Aurora 11.
Attachment #589910 - Flags: approval-mozilla-aurora- → approval-mozilla-aurora+
Pushed to Aurora: https://hg.mozilla.org/releases/mozilla-aurora/rev/9b079d6b0c33
Target Milestone: --- → mozilla11
Awesome! Looking forward to a drop in bug reports.
(In reply to ael from comment #0)
> Created attachment 576202 [details]
> error-1321984220130.txt
> 
> User Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0a2) Gecko/20111121
> Firefox/10.0a2
> Build ID: 20111121042024
> 
> 
> 
> Actual results:
> 
> Received error messages "Sync encountered an error while syncing: Unknown
> error". 
> Looking at the sync logs, the error is always in the tab sync part (it has
> happened three times). The attached log is typical.
> 
> NB This is on the latest Aurora.
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: