Last Comment Bug 704539 - channel.URI is undefined in Sync
: channel.URI is undefined in Sync
Status: VERIFIED FIXED
[fixed in services]
:
Product: Cloud Services
Classification: Client Software
Component: Firefox Sync: Backend (show other bugs)
: unspecified
: All All
: P2 major (vote)
: mozilla11
Assigned To: Gregory Szorc [:gps]
:
:
Mentors:
: 599965 669176 669589 672878 686468 686935 693531 693729 694676 695077 705279 706565 710115 713910 718043 723408 725950 734870 (view as bug list)
Depends on: responsestatus
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-22 10:36 PST by ael
Modified: 2012-03-12 15:28 PDT (History)
28 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
+
+
fixed
fixed


Attachments
error-1321984220130.txt (12.96 KB, text/plain)
2011-11-22 10:36 PST, ael
no flags Details
Handle channel redirects, v1 (7.52 KB, patch)
2012-01-06 11:38 PST, Gregory Szorc [:gps]
bzbarsky: feedback+
rnewman: feedback+
Details | Diff | Splinter Review
Better channel handling, v2 (14.27 KB, patch)
2012-01-09 17:31 PST, Gregory Szorc [:gps]
no flags Details | Diff | Splinter Review
Better channel handling, v3 (14.32 KB, patch)
2012-01-10 14:24 PST, Gregory Szorc [:gps]
rnewman: review+
bzbarsky: review+
Details | Diff | Splinter Review
Better channel handling, v4 (15.41 KB, patch)
2012-01-13 15:19 PST, Gregory Szorc [:gps]
rnewman: review+
Details | Diff | Splinter Review
Better channel handling, v5 (17.33 KB, patch)
2012-01-17 10:47 PST, Gregory Szorc [:gps]
rnewman: review+
Details | Diff | Splinter Review
Better channel handling, v6 (17.86 KB, patch)
2012-01-19 10:49 PST, Gregory Szorc [:gps]
gps: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta-
Details | Diff | Splinter Review

Description ael 2011-11-22 10:36:00 PST
Created attachment 576202 [details]
error-1321984220130.txt

User Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0a2) Gecko/20111121 Firefox/10.0a2
Build ID: 20111121042024



Actual results:

Received error messages "Sync encountered an error while syncing: Unknown error". 
Looking at the sync logs, the error is always in the tab sync part (it has happened three times). The attached log is typical.

NB This is on the latest Aurora.
Comment 1 ael 2011-12-02 13:21:22 PST
Still happening.

1322860492810	Sync.Collection	WARN	Caught unexpected exception channel.URI is undefined JS Stack trace: _onComplete(null,null,[object XPCWrappedNative_NoHelper])@resource.js:334 < innerBind(null,null,[object XPCWrappedNative_NoHelper])@util.js:1071 < Channel_onStopRequest([object XPCWrappedNative_NoHelper],null,0)@resource.js:563 in _onComplete.
1322860492810	Sync.Collection	DEBUG	JS Stack trace: _onComplete(null,null,[object XPCWrappedNative_NoHelper])@resource.js:334 < innerBind(null,null,[object XPCWrappedNative_NoHelper])@util.js:1071 < Channel_onStopRequest([object XPCWrappedNative_NoHelper],null,0)@resource.js:563
1322860492810	Sync.Collection	DEBUG	Caught exception channel.visitResponseHeaders is not a function JS Stack trace: _onComplete(null,null,[object XPCWrappedNative_NoHelper])@resource.js:361 < innerBind(null,null,[object XPCWrappedNative_NoHelper])@util.js:1071 < Channel_onStopRequest([object XPCWrappedNative_NoHelper],null,0)@resource.js:563 visiting headers in _onComplete.
1322860492810	Sync.Collection	DEBUG	JS Stack trace: _onComplete(null,null,[object XPCWrappedNative_NoHelper])@resource.js:361 < innerBind(null,null,[object XPCWrappedNative_NoHelper])@util.js:1071 < Channel_onStopRequest([object XPCWrappedNative_NoHelper],null,0)@resource.js:563
1322860492811	Sync.Status	DEBUG	Status for engine tabs: error.engine.reason.record_download_fail
1322860492811	Sync.Status	DEBUG	Status.service: success.status_ok => error.sync.failed_partial
1322860492811	Sync.ErrorHandler	DEBUG	tabs failed: null No traceback available
13
Comment 2 Richard Newman [:rnewman] 2011-12-04 00:25:46 PST
Thanks for the report. I'll be investigating this when I get some time… sorry for the delay.
Comment 3 Richard Newman [:rnewman] 2011-12-04 00:25:58 PST
*** Bug 706565 has been marked as a duplicate of this bug. ***
Comment 4 Richard Newman [:rnewman] 2011-12-04 00:28:26 PST
Flagging this as P2 ("system reliability fix"), because it seems to be a reliable failure for the people it affects, and it seems to affect nightlies on down.
Comment 5 Hope Stuckman 2011-12-04 00:52:20 PST
I know nothing about computers I only check my email very seldom do I send any out.
I have other emails am I supposed to let you guys know them so you can find out where all the bugs are coming from?
Comment 6 Philipp von Weitershausen [:philikon] 2011-12-12 20:01:32 PST
*** Bug 710115 has been marked as a duplicate of this bug. ***
Comment 7 Alex Keybl [:akeybl] 2011-12-27 07:04:57 PST
Tracking for Firefox 10 and 11 since we've now gotten a few reports of this error.
Comment 8 Richard Newman [:rnewman] 2011-12-28 12:32:31 PST
*** Bug 713910 has been marked as a duplicate of this bug. ***
Comment 9 Alex Keybl [:akeybl] 2012-01-05 18:25:04 PST
Hey Richard - sending over to you since I think you're heading up the investigation, and we don't want any tracked&unassigned bugs for Beta 10. Please take a look soon, or let us know if you don't think this warrants fixing for 10.
Comment 10 Richard Newman [:rnewman] 2012-01-05 20:59:26 PST
(In reply to Alex Keybl [:akeybl] from comment #9)
> Hey Richard - sending over to you since I think you're heading up the
> investigation, and we don't want any tracked&unassigned bugs for Beta 10.

Thanks!

> Please take a look soon, or let us know if you don't think this warrants
> fixing for 10.

I would love to have it fixed for 10, but I have neither any leads nor any free time to chase it down. Android Sync has really taken over my entire life :/

And so: over to you, gps! I think you've got some free cycles for a week… let's see if you can reproduce and fix this as a break from reconciling :D

This bug smells like it's related to Bug 690170, and the further work that I'd like to do to tidy that up (see comments in that bug). Follow that trail and see where it leads?
Comment 11 Gregory Szorc [:gps] 2012-01-06 11:38:02 PST
Created attachment 586497 [details] [diff] [review]
Handle channel redirects, v1

Presumably the issues are due to the underlying channel being redirected. This patch updates both HTTP channel producers to handle nsIChannelEventSink.asyncOnChannelRedirect().

I also addressed some lingering comments from bug 690170.

This patch is *not* up for review because it doesn't include new tests. However, existing xpcshell tests all pass.

In related news, this is the first time I've mucked around with nsI-level HTTP code. So, I wouldn't be surprised if I were missing something obvious.
Comment 12 Boris Zbarsky [:bz] (still a bit busy) 2012-01-06 12:09:55 PST
Comment on attachment 586497 [details] [diff] [review]
Handle channel redirects, v1

Seems fine to me.
Comment 13 Gregory Szorc [:gps] 2012-01-09 17:31:28 PST
Created attachment 587213 [details] [diff] [review]
Better channel handling, v2

Changes:

* try..catch around channel.QueryInterface, just in case
* always channel.QueryInterface(Ci.nsIHttpChannel) on xpcom handlers because the QI might have been lost
* install nsIChannelEventSink.asyncOnChannelRedirect() on channel listener. For RESTRequest, update this.channel through it.

This includes new tests. However, I wasn't actually able to reproduce the failures, so saying "all the tests pass" doesn't mean as much as I would like it to. That said, all the tests pass.

The only thing I'm not certain about is whether I'm doing the right thing in RESTRequest.onStartRequest and onStopRequest if the QI fails. I experimented with calling onComplete() with an error, but it broke some random tests. I didn't want to go down the rabbit hole. If you tell me what I should do if the QI fails, I'll code it up and change existing tests (but I would prefer not to, since things like J-PAKE were failing and I'd rather not bloat scope).
Comment 14 Boris Zbarsky [:bz] (still a bit busy) 2012-01-09 21:36:47 PST
Comment on attachment 587213 [details] [diff] [review]
Better channel handling, v2

One option in onStartRequest is to just cancel the channel with some error code if it's not what you expect.

That will then call your onStopRequest and all that.  Note that the attached code likely calls _onComplete multiple times in those failure conditions...

I doubt this needs more input from me, in any case.  ;)
Comment 15 Gregory Szorc [:gps] 2012-01-10 14:24:11 PST
Created attachment 587473 [details] [diff] [review]
Better channel handling, v3

Now calling channel.cancel() if we can't QI, per feedback.
Comment 16 Gregory Szorc [:gps] 2012-01-12 10:23:26 PST
*** Bug 599965 has been marked as a duplicate of this bug. ***
Comment 17 Richard Newman [:rnewman] 2012-01-12 19:56:18 PST
Comment on attachment 587473 [details] [diff] [review]
Better channel handling, v3

Review of attachment 587473 [details] [diff] [review]:
-----------------------------------------------------------------

Requesting final skim feedback, because this is a little close to voodoo for someone not intimately familiar with the edge cases. Otherwise, this looks good to me. Looking forward to seeing a green s-c run :)
Comment 18 Boris Zbarsky [:bz] (still a bit busy) 2012-01-12 20:51:27 PST
Comment on attachment 587473 [details] [diff] [review]
Better channel handling, v3

This still looks fine, still modulo the multiple calls to _onComplete.  I have no idea whether those are ok.  Please check with someone who knows that code?
Comment 19 Gregory Szorc [:gps] 2012-01-13 15:19:18 PST
Created attachment 588542 [details] [diff] [review]
Better channel handling, v4

This sets this._onComplete = this._onProgress = null in resource.js. Pretty sure this addresses the lingering concerns. All tests pass.
Comment 20 Richard Newman [:rnewman] 2012-01-17 09:38:17 PST
Comment on attachment 588542 [details] [diff] [review]
Better channel handling, v4

Review of attachment 588542 [details] [diff] [review]:
-----------------------------------------------------------------

That'll do, but perhaps check for null and log prior to invoking onFoo?
Comment 21 Gregory Szorc [:gps] 2012-01-17 10:47:28 PST
Created attachment 589239 [details] [diff] [review]
Better channel handling, v5

Guard onFoo() calls.
Comment 22 Richard Newman [:rnewman] 2012-01-17 11:03:54 PST
Comment on attachment 589239 [details] [diff] [review]
Better channel handling, v5

Review of attachment 589239 [details] [diff] [review]:
-----------------------------------------------------------------

Just switch warns to errors.

::: services/sync/modules/resource.js
@@ +627,5 @@
>      let error = Components.Exception("Aborting due to channel inactivity.",
>                                       Cr.NS_ERROR_NET_TIMEOUT);
> +    if (!this._onComplete) {
> +      this._log.warn("Unexpected error: _onComplete not defined in " +
> +                     "abortRequest.");

this._log.error("_onComplete not defined in abortRequest.");

::: services/sync/modules/rest.js
@@ +366,5 @@
>      let error = Components.Exception("Aborting due to channel inactivity.",
>                                       Cr.NS_ERROR_NET_TIMEOUT);
> +    if (!this.onComplete) {
> +      this._log.warn("Unexpected error: onComplete not defined in " +
> +                     "abortTimeout.")

As above.
Comment 23 Gregory Szorc [:gps] 2012-01-17 11:52:36 PST
Pushed to s-c: https://hg.mozilla.org/services/services-central/rev/3d526c2ac8e8
Comment 24 Gregory Szorc [:gps] 2012-01-17 11:58:37 PST
Merged into m-c: https://hg.mozilla.org/mozilla-central/rev/3d526c2ac8e8
Comment 25 Gregory Szorc [:gps] 2012-01-17 18:16:51 PST
This is tracking for 10 and 11. However, I'm not sure how to proceed. I'm kinda nervous about asking for 10 approval given the short time before it is released. But, if this is a regression (it isn't apparent from looking at the bug fields), then I suppose 10 uplift should be considered.

rnewman/mconnor: what are your thoughts?
Comment 26 Richard Newman [:rnewman] 2012-01-17 22:57:20 PST
(In reply to Gregory Szorc [:gps] from comment #25)

> rnewman/mconnor: what are your thoughts?

It's not really a huge change, it's had extensive review, it's got some test coverage, and it's been QAed. I suggest a day or two baking on m-c, request approval for Aurora, and if nothing awful happens ask for Beta.

We did this work because we were tracking an Fx10 issue, and it's a fairly compact change that addresses probably our most frequent source of user bug reports.
Comment 27 Tracy Walker [:tracy] 2012-01-18 08:28:24 PST
verified with m-c build of 20120118

Reporter, please confirm you no longer see this.  At the moment, the fix is only on Nightly.
Comment 28 Gregory Szorc [:gps] 2012-01-19 10:49:53 PST
Created attachment 589910 [details] [diff] [review]
Better channel handling, v6

This is the actual patch that landed in m-c.

[Approval Request Comment]
Regression caused by (bug #): The issue has apparently been present for a long time.
User impact if declined: Continued Sync errors for some users (not quantified)
Testing completed (on m-c, etc.): Landed on m-c a few days ago. No new bugs filed.
Risk to taking this patch (and alternatives if risky): Only 2 weeks left until uplift.

I'm requesting approval to Aurora and Beta because this bug is tracking for them. However, I'm personally worried about pushing to Beta given the limited time to test before it goes to stable. As stated in previous comments, the bug has undergone review by multiple people, so it *should* be safe. I guess the release drivers have the last word.
Comment 29 ael 2012-01-19 13:12:10 PST
(In reply to Tracy Walker [:tracy] from comment #27)
> Reporter, please confirm you no longer see this.  At the moment, the fix is
> only on Nightly.

Since the bug is not reproducible on demand (it is intermittent) and I don't want to risk my main profile on nightly, I will report after it has landed in Aurora whether there are any occurrences within (say) a week.

Thanks for fixing this!
Comment 30 Alex Keybl [:akeybl] 2012-01-19 16:06:12 PST
Comment on attachment 589910 [details] [diff] [review]
Better channel handling, v6

[Triage Comment]
Thanks for the risk evaluation gps. Since we're hesitant about the fix, and the volume of reports has dropped since this was first tracked, we can let this ride the train.
Comment 31 Gregory Szorc [:gps] 2012-01-19 16:51:36 PST
*** Bug 669176 has been marked as a duplicate of this bug. ***
Comment 32 Gregory Szorc [:gps] 2012-01-19 16:53:23 PST
*** Bug 669589 has been marked as a duplicate of this bug. ***
Comment 33 Gregory Szorc [:gps] 2012-01-19 16:53:57 PST
*** Bug 672878 has been marked as a duplicate of this bug. ***
Comment 34 Gregory Szorc [:gps] 2012-01-19 16:55:54 PST
*** Bug 686468 has been marked as a duplicate of this bug. ***
Comment 35 Gregory Szorc [:gps] 2012-01-19 16:56:21 PST
*** Bug 686935 has been marked as a duplicate of this bug. ***
Comment 36 Gregory Szorc [:gps] 2012-01-19 16:57:56 PST
*** Bug 693531 has been marked as a duplicate of this bug. ***
Comment 37 Gregory Szorc [:gps] 2012-01-19 16:58:15 PST
*** Bug 693729 has been marked as a duplicate of this bug. ***
Comment 38 Gregory Szorc [:gps] 2012-01-19 16:58:50 PST
*** Bug 694676 has been marked as a duplicate of this bug. ***
Comment 39 Gregory Szorc [:gps] 2012-01-19 17:00:29 PST
*** Bug 705279 has been marked as a duplicate of this bug. ***
Comment 40 Gregory Szorc [:gps] 2012-01-19 17:01:15 PST
*** Bug 718043 has been marked as a duplicate of this bug. ***
Comment 41 Alex Keybl [:akeybl] 2012-01-22 14:55:54 PST
I see there have been a number of dupes for this bug now - do you feel that these reports warrant uplifting to Aurora 11, gps?
Comment 42 Richard Newman [:rnewman] 2012-01-22 16:35:00 PST
(In reply to Alex Keybl [:akeybl] from comment #41)
> I see there have been a number of dupes for this bug now - do you feel that
> these reports warrant uplifting to Aurora 11, gps?

Thumbs up from me, FWIW. It looks pretty apparent that my earlier partial fix didn't address the whole issue for all users. I'll let gps give his own perspective, of course.
Comment 43 Gregory Szorc [:gps] 2012-01-23 09:39:26 PST
(In reply to Alex Keybl [:akeybl] from comment #41)
> I see there have been a number of dupes for this bug now - do you feel that
> these reports warrant uplifting to Aurora 11, gps?

I wouldn't have originally requested approval if I didn't think this bug warranted uplift. So, yes, I would like to see this uplifted to whatever version(s) the release drivers find appropriate. Furthermore, if it isn't too late to push out a new Aurora version before 11 goes to Beta, I would like to request that (both for this and bug 714202).
Comment 44 Alex Keybl [:akeybl] 2012-01-23 11:52:08 PST
Comment on attachment 589910 [details] [diff] [review]
Better channel handling, v6

[Triage Comment]
Originally denied for Aurora due to low volume of duplicate reports. That has since changed, so approving for Aurora 11.
Comment 45 Gregory Szorc [:gps] 2012-01-23 12:12:12 PST
Pushed to Aurora: https://hg.mozilla.org/releases/mozilla-aurora/rev/9b079d6b0c33
Comment 46 Richard Newman [:rnewman] 2012-01-23 12:22:59 PST
Awesome! Looking forward to a drop in bug reports.
Comment 47 Gregory Szorc [:gps] 2012-01-26 10:39:20 PST
*** Bug 695077 has been marked as a duplicate of this bug. ***
Comment 48 Gregory Szorc [:gps] 2012-02-02 12:57:02 PST
*** Bug 723408 has been marked as a duplicate of this bug. ***
Comment 49 Tracy Walker [:tracy] 2012-02-10 08:35:26 PST
*** Bug 725950 has been marked as a duplicate of this bug. ***
Comment 50 ishrat.begum87 2012-02-17 03:33:44 PST
(In reply to ael from comment #0)
> Created attachment 576202 [details]
> error-1321984220130.txt
> 
> User Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0a2) Gecko/20111121
> Firefox/10.0a2
> Build ID: 20111121042024
> 
> 
> 
> Actual results:
> 
> Received error messages "Sync encountered an error while syncing: Unknown
> error". 
> Looking at the sync logs, the error is always in the tab sync part (it has
> happened three times). The attached log is typical.
> 
> NB This is on the latest Aurora.
Comment 51 Richard Newman [:rnewman] 2012-03-12 15:28:13 PDT
*** Bug 734870 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.