Closed Bug 1259305 Opened 9 years ago Closed 8 years ago

Intermittent browser_download.js | application crashed [@ mozilla::net::ChannelEventQueue::Resume] after Assertion failure: mSuspendCount > 0, at /home/worker/workspace/build/src/netwerk/ipc/ChannelEventQueue.cpp:64

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- fixed

People

(Reporter: philor, Assigned: dragana)

References

Details

(Keywords: assertion, intermittent-failure, Whiteboard: [necko-active])

Attachments

(1 file, 3 obsolete files)

Seems like a problem in diversion.
Flags: needinfo?(jduell.mcbugs)
I am going to take a look at this, I was making changes in this code recently.
Assignee: nobody → dd.mozilla
Status: NEW → ASSIGNED
[Tracking Requested - why for this release]:
blocking-b2g: --- → 2.2r?
Flags: a11y-review?
blocking-b2g: 2.2r? → ---
Flags: a11y-review?
Blocks: 1201170
Attached patch bug_1259305_v1.patch (obsolete) — Splinter Review
Attachment #8734682 - Flags: review?(jduell.mcbugs)
Whiteboard: [necko-active]
Attachment #8734682 - Flags: review?(jduell.mcbugs)
Attached patch bug_1259305_v1.patch (obsolete) — Splinter Review
I am asking for a review, but I am open for suggestions. Suspending divert gets complicated :) a different solution that I was thinking of is to inform nsHttpChannel that it has a HttpChannelParent from the beginning (not only during diversion) and to suspend parent as weel all the time :)
Attachment #8734682 - Attachment is obsolete: true
Attachment #8736765 - Flags: review?(jduell.mcbugs)
Flags: needinfo?(jduell.mcbugs)
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Comment on attachment 8736765 [details] [diff] [review] bug_1259305_v1.patch Stealing review from jason on his request.
Attachment #8736765 - Flags: review?(jduell.mcbugs) → review?(honzab.moz)
Dragana, I need some background on this patch. Can you please explain: - cause of this bug - what the patch exactly does and WHY I'm missing good comments on rational in both bugzilla and the patch. Thanks.
Flags: needinfo?(dd.mozilla)
Comment on attachment 8736765 [details] [diff] [review] bug_1259305_v1.patch The conclusion here is that Dragana is going to find out why this patch has been introduced and if it's still needed. Until then I will drop the r? flag.
Attachment #8736765 - Flags: review?(honzab.moz)
Before bug 1201170 it was not possible to suspend channel during diversion. bug 1201170 made possible this, but it has a bug that this patch should fix. So to suspend OnDatAv. during divert we need to suspend mEventQ at parent. nsHttpChannelParent::suspend is not call during diver but only the nsHttpChannel. Bug 1201170 added this MessageDiversionStart/MessageDiversionStop and mParentChannel->SuspendMessageDiversion(); The bug from the bug 1201170 is that suspend count in nsHttpChannel and HttpChannelParent::mEventQ was not the same. so when divert start we need to suspend mEventQ as many time as nsHttpChannel is suspended minus 1 (this one is "suspend nsHttpChannel during diversion" that is called by HttpChannelParent it elf)
Flags: needinfo?(dd.mozilla)
Attached patch bug_1259305_v1.patch (obsolete) — Splinter Review
Attachment #8736765 - Attachment is obsolete: true
Attachment #8801157 - Flags: review?(honzab.moz)
(In reply to Dragana Damjanovic [:dragana] from comment #14) > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=2d6ef4e2b94acdbefd27576c32efc485ecae334b Is the test still failing or not with this patch? The try looks not more orange than usual, but please star the runs so that we are sure this 1) fixes the problem 2) doesn't break anything else.
Comment on attachment 8801157 [details] [diff] [review] bug_1259305_v1.patch Review of attachment 8801157 [details] [diff] [review]: ----------------------------------------------------------------- In general, I really strongly dislike flags of this type (msuspendedfor this or that..) however, the patch makes somewhat sense (spent an hour studying the code, I'm probably not the right reviewer for this..) I think only the comments need tuning here. disclaimer: didn't test locally, nor I'm super expert to the diversion and synthetization code. ::: netwerk/protocol/http/HttpChannelParent.cpp @@ +577,5 @@ > { > LOG(("HttpChannelParent::RecvSuspend [this=%p]\n", this)); > > if (mChannel) { > + mChannel->Suspend(); can you add a comment why we call Suspend() and not SuspendInternal() here? @@ +588,5 @@ > { > LOG(("HttpChannelParent::RecvResume [this=%p]\n", this)); > > if (mChannel) { > + mChannel->Resume(); similar @@ +1361,5 @@ > MOZ_ASSERT(!mDivertingFromChild, "Already suspended for diversion!"); > return NS_ERROR_UNEXPECTED; > } > > + // MessageDiversionStarted call will suspend mEventQ as meny time as the as many times @@ +1362,5 @@ > return NS_ERROR_UNEXPECTED; > } > > + // MessageDiversionStarted call will suspend mEventQ as meny time as the > + // channel has been suspended, so that channel and this queue are sync. are in sync ? @@ +1364,5 @@ > > + // MessageDiversionStarted call will suspend mEventQ as meny time as the > + // channel has been suspended, so that channel and this queue are sync. > + // to suspend nsHttpChannel but not to suspend mEventQ wi will call > + // mChannel->SuspendInternal() otherwise wi will call mChannel->Suspend() s/wi/we/ ? also, this comment doesn't make much sense to me nor I see the code you are referring to, tho it seems to be the most important part. @@ +1374,5 @@ > // been called and thus the channel may not be pending. If we've already > // automatically suspended after synthesizing the response, then we don't > // need to suspend again here. > if (!mSuspendAfterSynthesizeResponse) { > + // Suspend only nsHttpChannel but do not suspend mEventQ. say why please @@ +1384,5 @@ > // the response. > mSuspendedForDiversion = true; > + > + // To be consistent when mSuspendAfterSynthesizeResponse suspends the channel > + // we do not need to suspend mEventQ so wi will resume it. Is it so that this code cannot execute BEFORE HttpChannelParent::ResponseSynthesized()? Could we assert that with mWillSynthesizeResponse == false if possible? (with some explanatory comment?) Your comment should be a bit improved. As I understand, when mSuspendAfterSynthesizeResponse is true, the channel is suspended now (better said, was suspended after the synthetization has been done, based on the premise above). The mSuspendedForDiversion flag is here because mChannel->SuspendInternal() may fail when after onstartrequest notification (mPending on nshttpchannel is false), so we must not call resume() on the httpchannel later in that case, would fail or cause some kind of inconsistancy? (Honestly, I think that adding |NS_ENSURE_TRUE(mIsPending, NS_ERROR_NOT_AVAILABLE);| also to resumeinternal() would do... but that's a differnt bug/task.) So, you do mEventQ->Resume() here because the event queue has been suspended one more time by call to mChannel->MessageDiversionStarted(this) (which suspends the message queue the same number of times the channel is suspended). But since we do not actually suspend the channel here, we want to lower the number of mEventQ suspension times by one. OK.. makes sense :D Oh yeah...
Attachment #8801157 - Flags: review?(honzab.moz) → review+
comments updated.
Attachment #8801157 - Attachment is obsolete: true
Attachment #8807097 - Flags: review+
Keywords: checkin-needed
Pushed by cbook@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/3219bfc7b776 If nsHttpChannel has been suspended before divert to parent starts, suspend message diversion. r=mayhemer
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Is it worth taking this on Aurora for 51? If so, please nominate :)
Flags: needinfo?(dd.mozilla)
It was not causing any issues til now, because we do not suspend channels that early at the moment. So no need to uplift.
Flags: needinfo?(dd.mozilla)
The latest error message doesn't matched with the original signature. @philor can you help create a new one and move the OrangeFactor linkage to that? INFO - TEST-START | dom/workers/test/serviceworkers/browser_download.js INFO - GECKO(3190) | JavaScript error: jar:file:///builds/worker/workspace/build/application/firefox/omni.ja!/components/nsHelperAppDlg.js, line 538: TypeError: dialog is null INFO - TEST-INFO | started process screentopng INFO - TEST-INFO | screentopng: exit 0 INFO - Buffered messages logged at 17:40:55 INFO - Global property added while loading chrome://browser/content/browser-thumbnails.js: getTopSiteURLs INFO - Console message: [JavaScript Warning: "Use of nsIFile in content process is deprecated." {file: "resource://gre/modules/FileUtils.jsm" line: 174}] INFO - Buffered messages logged at 17:40:57 INFO - Console message: [JavaScript Warning: "Use of nsIFile in content process is deprecated." {file: "resource://gre/modules/FileUtils.jsm" line: 174}] INFO - Buffered messages logged at 17:40:58 INFO - Console message: [JavaScript Warning: "Unknown pseudo-class or pseudo-element ‘-moz-tree-line’. Ruleset ignored due to bad selector." {file: "chrome://global/content/xul.css" line: 629}] INFO - Console message: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://mochi.test:8888/browser/dom/workers/test/serviceworkers/download/window.html" line: 0}] INFO - Buffered messages logged at 17:41:01 INFO - Console message: [JavaScript Error: "TypeError: dialog is null" {file: "jar:file:///builds/worker/workspace/build/application/firefox/omni.ja!/components/nsHelperAppDlg.js" line: 538}] INFO - @jar:file:///builds/worker/workspace/build/application/firefox/omni.ja!/components/nsHelperAppDlg.js:538:1 INFO - INFO - Buffered messages finished INFO - TEST-UNEXPECTED-FAIL | dom/workers/test/serviceworkers/browser_download.js | Test timed out -
Flags: needinfo?(philringnalda)
One misstarred timeout on a try push, which might have been what he relanded or might have been a partial change, or might have been what was backed out? No thanks, I'll wait for it to happen somewhere other than try before filing that.
Flags: needinfo?(philringnalda)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: