Closed Bug 1397595 Opened 7 years ago Closed 7 years ago

Intermittent extensions/cookie/test/browser_test_favicon.js | application crashed [@ nsPipe::AdvanceReadCursor(nsPipeReadState&, unsigned int)] after Assertion failure: aReadState.mAvailable >= aBytesRead

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla57

Tracking Flags:

Tracking

Status

firefox-esr52

---

wontfix

firefox55

---

wontfix

firefox56

---

wontfix

firefox57

---

fixed

People

(Reporter: intermittent-bug-filer, Assigned: bkelly)

References

Details

(Keywords: crash, intermittent-failure)

Crash Data

Attachments

(2 files, 4 obsolete files)

P1 Convert existing nsPipe3.cpp MOZ_ASSERT's to use MOZ_DIAGNOSTIC_ASSERT. r=froydnj 7 years ago Ben Kelly [:bkelly, not reviewing] 8.95 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
P2 Convert nsPipe3.cpp NS_ASSERTION() calls to hard assertions. r=froydnj 7 years ago Ben Kelly [:bkelly, not reviewing] 9.51 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
P3 Clear nsPipeInputStream mReadCursoe and mReadLimit in DrainInputStream() to keep state consistent with mAvailable. r=froydnj 7 years ago Ben Kelly [:bkelly, not reviewing] 2.38 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
P4 Delay removing nsPipeInputStream from nsPipe list until DrainInputStream() actually runs. r=froydnj 7 years ago Ben Kelly [:bkelly, not reviewing] 3.54 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
P1 Clear nsPipeInputStream mReadCursoe and mReadLimit in DrainInputStream() to keep state consistent with mAvailable. r=froydnj 7 years ago Ben Kelly [:bkelly, not reviewing] 2.36 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review
P2 Delay removing nsPipeInputStream from nsPipe list until DrainInputStream() actually runs. r=froydnj 7 years ago Ben Kelly [:bkelly, not reviewing] 3.54 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review

Treeherder Bug Filer

Reporter

Description

•

7 years ago

treeherder

Filed by: philringnalda [at] gmail.com

https://treeherder.mozilla.org/logviewer.html#?job_id=128990395&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/D03Epu47RjGBdu8ZmecT-Q/runs/0/artifacts/public/logs/live_backing.log

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 1

•

7 years ago

I looked a bit at this and I'm trying to figure out how this assertion could be firing.  My best guess is that two different threads are trying to read the same pipe at the same time.

Maybe this is related to our new efforts to run OnDataAvailable off-main-thread?  Is it possible that we dispatch more than one ODA to a thread pool like StreamTransportService at the same time?

Dragana, what do you think?

Flags: needinfo?(dd.mozilla)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 2

•

7 years ago

As a short-term mitigation, perhaps we should abort the stream if this assertion does not hold.

Dragana Damjanovic [:dragana]

Comment 3

•

7 years ago

(In reply to Ben Kelly [:bkelly] from comment #1)
> I looked a bit at this and I'm trying to figure out how this assertion could
> be firing.  My best guess is that two different threads are trying to read
> the same pipe at the same time.
> 
> Maybe this is related to our new efforts to run OnDataAvailable
> off-main-thread?  Is it possible that we dispatch more than one ODA to a
> thread pool like StreamTransportService at the same time?
> 
> Dragana, what do you think?

I think we are only dispatching a new ODA after the previous us done.

Asking sc, he worked on the off-main-thread.

Flags: needinfo?(dd.mozilla) → needinfo?(schien)

Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz)

Comment 4

•

7 years ago

From the backtrace, this is HTTP off-main-thread in chrome process (nsHttpChannel), so it is not using the content process HTTP OMT mechanism I introduced recently.

As @dragana said, new ODA is run after previous ODA callback is done. In addition, this case is ODA for imgRequest and Gecko only use single "ImageIO" thread to read the IO stream. It's not likely to hit the scenario @bkelly mentioned in comment #1.

Flags: needinfo?(schien)

Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz)

Comment 5

•

7 years ago

I also noticed that assertion happens after "TEST-OK | extensions/cookie/test/browser_test_favicon.js | took 460ms".
https://treeherder.mozilla.org/logviewer.html#?job_id=128990395&repo=mozilla-inbound&lineNumber=28566

Could this indicate that the pipe is closed before the nsInputStreamReadyEvent is executed?

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 6

•

7 years ago

(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #4)
> As @dragana said, new ODA is run after previous ODA callback is done. In
> addition, this case is ODA for imgRequest and Gecko only use single
> "ImageIO" thread to read the IO stream. It's not likely to hit the scenario
> @bkelly mentioned in comment #1.

Ok, thanks for verifying.  I was a bit scared thinking about that possibility.  I'll take a look at the pipe code a bit more here.

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

7 years ago

Assignee: nobody → bkelly

Status: NEW → ASSIGNED

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

7 years ago

Component: Networking: Cookies → XPCOM

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 7

•

7 years ago

I'm going to add some assertions here to try to narrow down where the state is going bad.  We should leave this bug open in the meantime.

Keywords: leave-open

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 8

•

7 years ago

Hmm, or maybe we should just get rid of mAvailable.  Its duplicate data and can be computed from `(mReadLimit - mReadCursor)`.  If there is a bug in this book-keeping, lets just delete the book-keeping.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 9

•

7 years ago

Attached patch P1 Convert existing nsPipe3.cpp MOZ_ASSERT's to use MOZ_DIAGNOSTIC_ASSERT. r=froydnj (obsolete) — Details — Splinter Review

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 10

•

7 years ago

Attached patch P2 Convert nsPipe3.cpp NS_ASSERTION() calls to hard assertions. r=froydnj (obsolete) — Details — Splinter Review

Initial try to see if we can tighten the asserts without blowing up the tree:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f6b8a9f57cbef9068432e7b44c3890ecc40a73cd

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

7 years ago

Keywords: leave-open

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 11

•

7 years ago

(In reply to Ben Kelly [:bkelly] from comment #8)
> Hmm, or maybe we should just get rid of mAvailable.  Its duplicate data and
> can be computed from `(mReadLimit - mReadCursor)`.  If there is a bug in
> this book-keeping, lets just delete the book-keeping.

Actually, we can't do this because mAvailable is the total length of the pipe, not just the current segment length.

I think I might see the problem, though.

1. When we close the stream we call DrainInputStream() which sets mAvailable to 0.
2. We then call AdvanceReadSegment() as far as we to release the buffer memory. Usually this leaves us with mReadCursor == mReadLimit, but its not guaranteed.
3. When a Read() happens we use mReadCursor and mReadLimit to determine the length of data available.  If they are not equal then we can get a read longer than mAvailable thanks to step 1 setting it to zero.

I think the solution here is to null out mReadCursor and mReadLimit when we drain the stream.  We only drain when the stream is closed, so we pretty much want to avoid any further reads.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 12

•

7 years ago

Attached patch P3 Clear nsPipeInputStream mReadCursoe and mReadLimit in DrainInputStream() to keep state consistent with mAvailable. r=froydnj (obsolete) — Details — Splinter Review

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 13

•

7 years ago

Attached patch P4 Delay removing nsPipeInputStream from nsPipe list until DrainInputStream() actually runs. r=froydnj (obsolete) — Details — Splinter Review

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e927b68389e2a1ff6bf751fb12677b48798f8002

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 14

•

7 years ago

Comment on attachment 8905956 [details] [diff] [review]
P1 Convert existing nsPipe3.cpp MOZ_ASSERT's to use MOZ_DIAGNOSTIC_ASSERT. r=froydnj

This patch converts existing MOZ_ASSERT's to MOZ_DIAGNOSTIC_ASSERT instead.  As far as I could tell all of these were relatively cheap checks.  I think having some opt-build assertion checking is important here because pipe is used so frequently throughout the browser.

Attachment #8905956 - Flags: review?(nfroyd)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 15

•

7 years ago

Comment on attachment 8905958 [details] [diff] [review]
P2 Convert nsPipe3.cpp NS_ASSERTION() calls to hard assertions. r=froydnj

This patch converts "soft" NS_ASSERTION() calls into "hard" MOZ_DIAGNOSTIC_ASSERT and MOZ_ASSERT calls.  I think it should be easier to do this now that we are not exposing pipe streams directly to legacy addons.

Attachment #8905958 - Flags: review?(nfroyd)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 16

•

7 years ago

Comment on attachment 8905992 [details] [diff] [review]
P3 Clear nsPipeInputStream mReadCursoe and mReadLimit in DrainInputStream() to keep state consistent with mAvailable. r=froydnj

This patch fixes a potential cause for the crash in this bug.  The crash is triggering because we are advancing the read cursor by a number larger than mAvailable.  The original amount to read and move the cursor is derived from the current segment's (mReadLimit - mReadCursor).  So ultimately this bug is about mAvailable somehow getting set to a value inconsistent with those other pointers.

I believe the cause is that we were forcing mAvailable to zero when the stream was closed, but we did not necessarily clear the mReadLimit and mReadCursor.  The result is a low probability of an additional read call seeing inconsistent state.

The solution is to clear mReadLimit and mReadCursor when force mAvailable to zero.

There is another low-probably problem here in that someone writing to the pipe could then update the state again after this.  The next patch addresses that.

Attachment #8905992 - Flags: review?(nfroyd)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 17

•

7 years ago

Comment on attachment 8905993 [details] [diff] [review]
P4 Delay removing nsPipeInputStream from nsPipe list until DrainInputStream() actually runs. r=froydnj

> There is another low-probably problem here in that someone writing to the
> pipe could then update the state again after this.  The next patch addresses
> that.

Sorry, this is not the lingering problem.  This can't happen because when the stream is closed we remove it from nsPipe::mInputList.  Since its not in the list it won't get updated any more.

The actual, problem, though is sometimes we don't call DrainInputStream immediately when its closed.  We wait until the current active read completes and then we drain it.

This creates a situation where an input stream is not in nsPipe::mInputList, but does hold active pointers to the central buffer.  There is some risk that a writer or pipe will release the segment being pointed to by the input stream leaving this pointers corrupt.

To remedy this situation I propose we only remove the stream of mInputList once DrainInputStream() actually runs successfully.  If its deferred then we delay removing the stream as well.  This ensures its state continues to be updated while its undrained.

Attachment #8905993 - Flags: review?(nfroyd)

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

7 years ago

Blocks: 1395308

Comment hidden (Intermittent Failures Robot)

2 failures in 924 pushes (0.002 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 2

Platform breakdown:
* osx-10-10: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1397595&startday=2017-09-04&endday=2017-09-10&tree=all

Nathan Froyd [:froydnj]

Comment 19

•

7 years ago

Comment on attachment 8905956 [details] [diff] [review]
P1 Convert existing nsPipe3.cpp MOZ_ASSERT's to use MOZ_DIAGNOSTIC_ASSERT. r=froydnj

Review of attachment 8905956 [details] [diff] [review]:
-----------------------------------------------------------------

Please hold off on committing this part, at least, until after 57 goes to beta, so if something does go wrong, people don't start panic'ing.

Attachment #8905956 - Flags: review?(nfroyd) → review+

Nathan Froyd [:froydnj]

Comment 20

•

7 years ago

Comment on attachment 8905958 [details] [diff] [review]
P2 Convert nsPipe3.cpp NS_ASSERTION() calls to hard assertions. r=froydnj

Review of attachment 8905958 [details] [diff] [review]:
-----------------------------------------------------------------

Same comments about 57 moving to beta apply here, too.

::: xpcom/io/nsPipe3.cpp
@@ +691,5 @@
>  
>  void
>  nsPipe::AdvanceReadCursor(nsPipeReadState& aReadState, uint32_t aBytesRead)
>  {
> +  MOZ_DIAGNOSTIC_ASSERT(aBytesRead);

Nit: aBytesRead > 0, please.

@@ +898,5 @@
>  
>  void
>  nsPipe::AdvanceWriteCursor(uint32_t aBytesWritten)
>  {
> +  MOZ_DIAGNOSTIC_ASSERT(aBytesWritten);

Nit: aBytesWritten > 0, please.

Attachment #8905958 - Flags: review?(nfroyd) → review+

Nathan Froyd [:froydnj]

Updated

•

7 years ago

Attachment #8905992 - Flags: review?(nfroyd) → review+

Nathan Froyd [:froydnj]

Comment 21

•

7 years ago

Comment on attachment 8905993 [details] [diff] [review]
P4 Delay removing nsPipeInputStream from nsPipe list until DrainInputStream() actually runs. r=froydnj

Review of attachment 8905993 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense to me.  Is it possible to write a test for this?

::: xpcom/io/nsPipe3.cpp
@@ +1015,5 @@
>  
>      bool needNotify = false;
>  
> +    // OnInputException() can drain the stream and remove it from
> +    // mInputList.  So iterate over a temp list instead.

Does something continue to hold the input stream alive for us if it is removed from the list?

Attachment #8905993 - Flags: review?(nfroyd) → review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 22

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #21)

> Makes sense to me.  Is it possible to write a test for this?

I think it would be tricky to reliably trigger.  You would have to schedule separate threads to try to perform the operations at the same time and get the locking to fall just right.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 23

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #19)
> Please hold off on committing this part, at least, until after 57 goes to
> beta, so if something does go wrong, people don't start panic'ing.

I'll move P1 and P2 to a separate bug.  I'd like to get the other fixes in for FF57.

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

7 years ago

Blocks: 1398942

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 24

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #21)
> ::: xpcom/io/nsPipe3.cpp
> @@ +1015,5 @@
> >  
> >      bool needNotify = false;
> >  
> > +    // OnInputException() can drain the stream and remove it from
> > +    // mInputList.  So iterate over a temp list instead.
> 
> Does something continue to hold the input stream alive for us if it is
> removed from the list?

Maybe yes, maybe no.  Depends on if something external is holding the nsIInputStream.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 25

•

7 years ago

Attached patch P1 Clear nsPipeInputStream mReadCursoe and mReadLimit in DrainInputStream() to keep state consistent with mAvailable. r=froydnj — Details — Splinter Review

Attachment #8905956 - Attachment is obsolete: true

Attachment #8905958 - Attachment is obsolete: true

Attachment #8905992 - Attachment is obsolete: true

Attachment #8906796 - Flags: review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 26

•

7 years ago

Attached patch P2 Delay removing nsPipeInputStream from nsPipe list until DrainInputStream() actually runs. r=froydnj — Details — Splinter Review

Attachment #8905993 - Attachment is obsolete: true

Attachment #8906797 - Flags: review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 27

•

7 years ago

Inbound is closed, so lets do another try build with just these two patches while we wait:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6cc7ed824a0fb47fa166e9547982fa37dc459bb7

Pulsebot

Comment 28

•

7 years ago

Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/59bb8f38be8d
P1 Clear nsPipeInputStream mReadCursoe and mReadLimit in DrainInputStream() to keep state consistent with mAvailable. r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/3e638bc2c4f3
P2 Delay removing nsPipeInputStream from nsPipe list until DrainInputStream() actually runs. r=froydnj

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 29

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/59bb8f38be8d
https://hg.mozilla.org/mozilla-central/rev/3e638bc2c4f3

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox57: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox55: --- → wontfix

status-firefox56: --- → wontfix

status-firefox-esr52: --- → wontfix

You need to log in before you can comment on or make changes to this bug.