Crash in nsPipe::GetWriteSegment

RESOLVED FIXED in Firefox 51

Status

()

defect
--
critical
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: philipp, Assigned: bkelly)

Tracking

({crash, regression})

52 Branch
mozilla53
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox50 unaffected, firefox51 fixed, firefox52 fixed, firefox53 fixed)

Details

(crash signature)

Attachments

(2 attachments, 2 obsolete attachments)

Reporter

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-5d37c597-db97-49fb-a09f-dcd882170113.
=============================================================
Crashing Thread (11)
Frame 	Module 	Signature 	Source
0 	xul.dll 	nsPipe::GetWriteSegment(char*&, unsigned int&) 	xpcom/io/nsPipe3.cpp:873
1 	xul.dll 	nsPipeOutputStream::WriteSegments(nsresult (*)(nsIOutputStream*, void*, char*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*) 	xpcom/io/nsPipe3.cpp:1781
2 	xul.dll 	mozilla::net::nsHttpTransaction::WriteSegments(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*) 	netwerk/protocol/http/nsHttpTransaction.cpp:848
3 	xul.dll 	mozilla::net::Http2Session::WriteSegmentsAgain(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*, bool*) 	netwerk/protocol/http/Http2Session.cpp:2737
4 	xul.dll 	mozilla::net::nsHttpConnection::OnSocketReadable() 	netwerk/protocol/http/nsHttpConnection.cpp:1899
5 	xul.dll 	mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) 	netwerk/protocol/http/nsHttpConnection.cpp:2215
6 	xul.dll 	mozilla::net::nsSocketInputStream::OnSocketReady(nsresult) 	netwerk/base/nsSocketTransport2.cpp:290

these are crashes with crash reason MOZ_RELEASE_ASSERT(seg) that was added in bug 1134372. though that patch was uplifted to 51.0b as well, the crash isn't present there but only in 53.0a1 & 52.0a2 so far.
This will only show up in FF53 and FF52 because MOZ_DIAGNOSTIC_ASSERT is not enabled for release/beta.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Oh, it seems nsSegmentedBuffer::AppendSegment() is fallible.  So under OOM conditions it returns nullptr.  I can fix this to check for a nullptr return value, but the OOM will likely trigger immediately after from a different allocation.
I think the safest and easiest-to-uplift thing to do here is just properly report the OOM condition.
Attachment #8826696 - Flags: review?(nfroyd)
Note, I can't just restore the code prior to bug 1134372 because that treated this failure as a "would block" condition.  I don't think the pipe would necessarily wake back up if we did that for OOM conditions.
Comment on attachment 8826696 [details] [diff] [review]
Properly NS_ABORT_OOM() in nsPipe. r=froydnj

Review of attachment 8826696 [details] [diff] [review]:
-----------------------------------------------------------------

It looks like nsSegmentedBuffer::AppendNewSegment can also return nullptr in non-OOM conditions:

http://dxr.mozilla.org/mozilla-central/source/xpcom/io/nsSegmentedBuffer.cpp#30

Though I guess we initialize that with UINT32_MAX, so that condition is effectively a no-op.

I guess this is OK; it won't increase the crash rate, it'll just give us more accurate information about the cause of the crash, correct?  Though I'm curious why we wouldn't take alternate approaches here:

1) Make AppendNewSegment simply call moz_xmalloc for the last fallible allocation it does, so we could see OOM there.  This would have the benefit of avoiding nullptr crashes in a host of places.
2) Make this case return NS_ERROR_OUT_OF_MEMORY or something, and let pipe error reporting do its thing:

http://dxr.mozilla.org/mozilla-central/source/xpcom/io/nsPipe3.cpp#1798
3) Something else.

r=me assuming my sense of whether this keeps the crash rate the same is correct; if that's not the case, we should discuss, especially with a potential uplift to beta.
Attachment #8826696 - Flags: review?(nfroyd) → review+
Let me see if I can make the NS_ERROR_OUT_OF_MEMORY approach work.  I want to write a gtest.
This patch returns NS_ERROR_OUT_OF_MEMORY as suggested.
Attachment #8826696 - Attachment is obsolete: true
Attachment #8826714 - Flags: review?(nfroyd)
This patch adds a gtest that forces an OOM to verify how nsPipe handles it.  It crashes without the P1 and passes with the P1.

It does take a long time to run, though, so lets see how it does on try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=50b9b2661cae7a87f4a9e79ace31289c6cda61a8

Because its forcing OOM it can push the test machines into swap usage.
Attachment #8826715 - Flags: review?(nfroyd)
Apparently some of our test runners have more the 256GB of memory+swap:

https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1331038&attachment=8826715
Updated test to try allocating memory up to 1TB.  It seems some of our test runners have enough memory+swap to handle 256GB.
Attachment #8826715 - Attachment is obsolete: true
Attachment #8826715 - Flags: review?(nfroyd)
Attachment #8826741 - Flags: review?(nfroyd)
So, try shows that the gtest is not reliable on automation.  It causes the harness to timeout before the OOM happens.  The test runners probably have huge swap partitions making it hard to actually OOM them, but easy to slow them down.

I'm going to claim r+ for the NS_ERROR_OUT_OF_MEMORY patch given comment 5 and land it by itself.
Attachment #8826741 - Flags: review?(nfroyd)
Comment on attachment 8826714 [details] [diff] [review]
P1 Make nsPipe handle OOM conditions gracefully. r=froydnj

Again, claiming r+ based on comment 5.  I verified locally that this works.  I have a gtest, but the test runners make it too hard to OOM in automation so I can't actually land it with this patch.
Attachment #8826714 - Flags: review?(nfroyd) → review+

Comment 16

3 years ago
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6b716be52be0
Make nsPipe handle OOM conditions gracefully. r=froydnj
Comment on attachment 8826714 [details] [diff] [review]
P1 Make nsPipe handle OOM conditions gracefully. r=froydnj

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1134372
[User impact if declined]: nullptr de-ref crashes when the browser is near OOM conditions
[Is this code covered by automated tests?]: nsPipe is heavily tested, but its hard to test this particular path.  I wrote a gtest that forces OOM, but it times out in automation because our test runners are configured to make OOM very hard.
[Has the fix been verified in Nightly?]: I verified it locally, but it has not landed in m-c yet.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Minimal risk
[Why is the change risky/not risky?]: The patch is just a nullptr check that triggers an existing error path.  I verified locally that it works.
[String changes made/needed]: None
Attachment #8826714 - Flags: approval-mozilla-beta?
Attachment #8826714 - Flags: approval-mozilla-aurora?

Comment 18

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/6b716be52be0
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment on attachment 8826714 [details] [diff] [review]
P1 Make nsPipe handle OOM conditions gracefully. r=froydnj

OOM crash fix, let's take this for aurora and for 51 RC. It should uplift before the beta to release merge on Monday.
Attachment #8826714 - Flags: approval-mozilla-beta?
Attachment #8826714 - Flags: approval-mozilla-beta+
Attachment #8826714 - Flags: approval-mozilla-aurora?
Attachment #8826714 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.