Closed
Bug 1397595
Opened 8 years ago
Closed 8 years ago
Intermittent extensions/cookie/test/browser_test_favicon.js | application crashed [@ nsPipe::AdvanceReadCursor(nsPipeReadState&, unsigned int)] after Assertion failure: aReadState.mAvailable >= aBytesRead
Categories
(Core :: XPCOM, defect, P5)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla57
People
(Reporter: intermittent-bug-filer, Assigned: bkelly)
References
Details
(Keywords: crash, intermittent-failure)
Crash Data
Attachments
(2 files, 4 obsolete files)
2.36 KB,
patch
|
bkelly
:
review+
|
Details | Diff | Splinter Review |
3.54 KB,
patch
|
bkelly
:
review+
|
Details | Diff | Splinter Review |
Assignee | ||
Comment 1•8 years ago
|
||
I looked a bit at this and I'm trying to figure out how this assertion could be firing. My best guess is that two different threads are trying to read the same pipe at the same time.
Maybe this is related to our new efforts to run OnDataAvailable off-main-thread? Is it possible that we dispatch more than one ODA to a thread pool like StreamTransportService at the same time?
Dragana, what do you think?
Flags: needinfo?(dd.mozilla)
Assignee | ||
Comment 2•8 years ago
|
||
As a short-term mitigation, perhaps we should abort the stream if this assertion does not hold.
Comment 3•8 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #1)
> I looked a bit at this and I'm trying to figure out how this assertion could
> be firing. My best guess is that two different threads are trying to read
> the same pipe at the same time.
>
> Maybe this is related to our new efforts to run OnDataAvailable
> off-main-thread? Is it possible that we dispatch more than one ODA to a
> thread pool like StreamTransportService at the same time?
>
> Dragana, what do you think?
I think we are only dispatching a new ODA after the previous us done.
Asking sc, he worked on the off-main-thread.
Flags: needinfo?(dd.mozilla) → needinfo?(schien)
Comment 4•8 years ago
|
||
From the backtrace, this is HTTP off-main-thread in chrome process (nsHttpChannel), so it is not using the content process HTTP OMT mechanism I introduced recently.
As @dragana said, new ODA is run after previous ODA callback is done. In addition, this case is ODA for imgRequest and Gecko only use single "ImageIO" thread to read the IO stream. It's not likely to hit the scenario @bkelly mentioned in comment #1.
Flags: needinfo?(schien)
Comment 5•8 years ago
|
||
I also noticed that assertion happens after "TEST-OK | extensions/cookie/test/browser_test_favicon.js | took 460ms".
https://treeherder.mozilla.org/logviewer.html#?job_id=128990395&repo=mozilla-inbound&lineNumber=28566
Could this indicate that the pipe is closed before the nsInputStreamReadyEvent is executed?
Assignee | ||
Comment 6•8 years ago
|
||
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #4)
> As @dragana said, new ODA is run after previous ODA callback is done. In
> addition, this case is ODA for imgRequest and Gecko only use single
> "ImageIO" thread to read the IO stream. It's not likely to hit the scenario
> @bkelly mentioned in comment #1.
Ok, thanks for verifying. I was a bit scared thinking about that possibility. I'll take a look at the pipe code a bit more here.
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Assignee | ||
Updated•8 years ago
|
Component: Networking: Cookies → XPCOM
Assignee | ||
Comment 7•8 years ago
|
||
I'm going to add some assertions here to try to narrow down where the state is going bad. We should leave this bug open in the meantime.
Keywords: leave-open
Assignee | ||
Comment 8•8 years ago
|
||
Hmm, or maybe we should just get rid of mAvailable. Its duplicate data and can be computed from `(mReadLimit - mReadCursor)`. If there is a bug in this book-keeping, lets just delete the book-keeping.
Assignee | ||
Comment 9•8 years ago
|
||
Assignee | ||
Comment 10•8 years ago
|
||
Initial try to see if we can tighten the asserts without blowing up the tree:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f6b8a9f57cbef9068432e7b44c3890ecc40a73cd
Assignee | ||
Updated•8 years ago
|
Keywords: leave-open
Assignee | ||
Comment 11•8 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #8)
> Hmm, or maybe we should just get rid of mAvailable. Its duplicate data and
> can be computed from `(mReadLimit - mReadCursor)`. If there is a bug in
> this book-keeping, lets just delete the book-keeping.
Actually, we can't do this because mAvailable is the total length of the pipe, not just the current segment length.
I think I might see the problem, though.
1. When we close the stream we call DrainInputStream() which sets mAvailable to 0.
2. We then call AdvanceReadSegment() as far as we to release the buffer memory. Usually this leaves us with mReadCursor == mReadLimit, but its not guaranteed.
3. When a Read() happens we use mReadCursor and mReadLimit to determine the length of data available. If they are not equal then we can get a read longer than mAvailable thanks to step 1 setting it to zero.
I think the solution here is to null out mReadCursor and mReadLimit when we drain the stream. We only drain when the stream is closed, so we pretty much want to avoid any further reads.
Assignee | ||
Comment 12•8 years ago
|
||
Assignee | ||
Comment 13•8 years ago
|
||
Assignee | ||
Comment 14•8 years ago
|
||
Comment on attachment 8905956 [details] [diff] [review]
P1 Convert existing nsPipe3.cpp MOZ_ASSERT's to use MOZ_DIAGNOSTIC_ASSERT. r=froydnj
This patch converts existing MOZ_ASSERT's to MOZ_DIAGNOSTIC_ASSERT instead. As far as I could tell all of these were relatively cheap checks. I think having some opt-build assertion checking is important here because pipe is used so frequently throughout the browser.
Attachment #8905956 -
Flags: review?(nfroyd)
Assignee | ||
Comment 15•8 years ago
|
||
Comment on attachment 8905958 [details] [diff] [review]
P2 Convert nsPipe3.cpp NS_ASSERTION() calls to hard assertions. r=froydnj
This patch converts "soft" NS_ASSERTION() calls into "hard" MOZ_DIAGNOSTIC_ASSERT and MOZ_ASSERT calls. I think it should be easier to do this now that we are not exposing pipe streams directly to legacy addons.
Attachment #8905958 -
Flags: review?(nfroyd)
Assignee | ||
Comment 16•8 years ago
|
||
Comment on attachment 8905992 [details] [diff] [review]
P3 Clear nsPipeInputStream mReadCursoe and mReadLimit in DrainInputStream() to keep state consistent with mAvailable. r=froydnj
This patch fixes a potential cause for the crash in this bug. The crash is triggering because we are advancing the read cursor by a number larger than mAvailable. The original amount to read and move the cursor is derived from the current segment's (mReadLimit - mReadCursor). So ultimately this bug is about mAvailable somehow getting set to a value inconsistent with those other pointers.
I believe the cause is that we were forcing mAvailable to zero when the stream was closed, but we did not necessarily clear the mReadLimit and mReadCursor. The result is a low probability of an additional read call seeing inconsistent state.
The solution is to clear mReadLimit and mReadCursor when force mAvailable to zero.
There is another low-probably problem here in that someone writing to the pipe could then update the state again after this. The next patch addresses that.
Attachment #8905992 -
Flags: review?(nfroyd)
Assignee | ||
Comment 17•8 years ago
|
||
Comment on attachment 8905993 [details] [diff] [review]
P4 Delay removing nsPipeInputStream from nsPipe list until DrainInputStream() actually runs. r=froydnj
> There is another low-probably problem here in that someone writing to the
> pipe could then update the state again after this. The next patch addresses
> that.
Sorry, this is not the lingering problem. This can't happen because when the stream is closed we remove it from nsPipe::mInputList. Since its not in the list it won't get updated any more.
The actual, problem, though is sometimes we don't call DrainInputStream immediately when its closed. We wait until the current active read completes and then we drain it.
This creates a situation where an input stream is not in nsPipe::mInputList, but does hold active pointers to the central buffer. There is some risk that a writer or pipe will release the segment being pointed to by the input stream leaving this pointers corrupt.
To remedy this situation I propose we only remove the stream of mInputList once DrainInputStream() actually runs successfully. If its deferred then we delay removing the stream as well. This ensures its state continues to be updated while its undrained.
Attachment #8905993 -
Flags: review?(nfroyd)
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 19•8 years ago
|
||
Comment on attachment 8905956 [details] [diff] [review]
P1 Convert existing nsPipe3.cpp MOZ_ASSERT's to use MOZ_DIAGNOSTIC_ASSERT. r=froydnj
Review of attachment 8905956 [details] [diff] [review]:
-----------------------------------------------------------------
Please hold off on committing this part, at least, until after 57 goes to beta, so if something does go wrong, people don't start panic'ing.
Attachment #8905956 -
Flags: review?(nfroyd) → review+
![]() |
||
Comment 20•8 years ago
|
||
Comment on attachment 8905958 [details] [diff] [review]
P2 Convert nsPipe3.cpp NS_ASSERTION() calls to hard assertions. r=froydnj
Review of attachment 8905958 [details] [diff] [review]:
-----------------------------------------------------------------
Same comments about 57 moving to beta apply here, too.
::: xpcom/io/nsPipe3.cpp
@@ +691,5 @@
>
> void
> nsPipe::AdvanceReadCursor(nsPipeReadState& aReadState, uint32_t aBytesRead)
> {
> + MOZ_DIAGNOSTIC_ASSERT(aBytesRead);
Nit: aBytesRead > 0, please.
@@ +898,5 @@
>
> void
> nsPipe::AdvanceWriteCursor(uint32_t aBytesWritten)
> {
> + MOZ_DIAGNOSTIC_ASSERT(aBytesWritten);
Nit: aBytesWritten > 0, please.
Attachment #8905958 -
Flags: review?(nfroyd) → review+
![]() |
||
Updated•8 years ago
|
Attachment #8905992 -
Flags: review?(nfroyd) → review+
![]() |
||
Comment 21•8 years ago
|
||
Comment on attachment 8905993 [details] [diff] [review]
P4 Delay removing nsPipeInputStream from nsPipe list until DrainInputStream() actually runs. r=froydnj
Review of attachment 8905993 [details] [diff] [review]:
-----------------------------------------------------------------
Makes sense to me. Is it possible to write a test for this?
::: xpcom/io/nsPipe3.cpp
@@ +1015,5 @@
>
> bool needNotify = false;
>
> + // OnInputException() can drain the stream and remove it from
> + // mInputList. So iterate over a temp list instead.
Does something continue to hold the input stream alive for us if it is removed from the list?
Attachment #8905993 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 22•8 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #21)
> Makes sense to me. Is it possible to write a test for this?
I think it would be tricky to reliably trigger. You would have to schedule separate threads to try to perform the operations at the same time and get the locking to fall just right.
Assignee | ||
Comment 23•8 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #19)
> Please hold off on committing this part, at least, until after 57 goes to
> beta, so if something does go wrong, people don't start panic'ing.
I'll move P1 and P2 to a separate bug. I'd like to get the other fixes in for FF57.
Assignee | ||
Comment 24•8 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #21)
> ::: xpcom/io/nsPipe3.cpp
> @@ +1015,5 @@
> >
> > bool needNotify = false;
> >
> > + // OnInputException() can drain the stream and remove it from
> > + // mInputList. So iterate over a temp list instead.
>
> Does something continue to hold the input stream alive for us if it is
> removed from the list?
Maybe yes, maybe no. Depends on if something external is holding the nsIInputStream.
Assignee | ||
Comment 25•8 years ago
|
||
Attachment #8905956 -
Attachment is obsolete: true
Attachment #8905958 -
Attachment is obsolete: true
Attachment #8905992 -
Attachment is obsolete: true
Attachment #8906796 -
Flags: review+
Assignee | ||
Comment 26•8 years ago
|
||
Attachment #8905993 -
Attachment is obsolete: true
Attachment #8906797 -
Flags: review+
Assignee | ||
Comment 27•8 years ago
|
||
Inbound is closed, so lets do another try build with just these two patches while we wait:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6cc7ed824a0fb47fa166e9547982fa37dc459bb7
Comment 28•8 years ago
|
||
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/59bb8f38be8d
P1 Clear nsPipeInputStream mReadCursoe and mReadLimit in DrainInputStream() to keep state consistent with mAvailable. r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/3e638bc2c4f3
P2 Delay removing nsPipeInputStream from nsPipe list until DrainInputStream() actually runs. r=froydnj
![]() |
||
Comment 29•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/59bb8f38be8d
https://hg.mozilla.org/mozilla-central/rev/3e638bc2c4f3
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Updated•8 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•