Closed Bug 1331038 Opened 5 years ago Closed 5 years ago
Crash in ns
Pipe::Get Write Segment
934 bytes, patch
|Details | Diff | Splinter Review|
2.67 KB, patch
|Details | Diff | Splinter Review|
This bug was filed from the Socorro interface and is report bp-5d37c597-db97-49fb-a09f-dcd882170113. ============================================================= Crashing Thread (11) Frame Module Signature Source 0 xul.dll nsPipe::GetWriteSegment(char*&, unsigned int&) xpcom/io/nsPipe3.cpp:873 1 xul.dll nsPipeOutputStream::WriteSegments(nsresult (*)(nsIOutputStream*, void*, char*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*) xpcom/io/nsPipe3.cpp:1781 2 xul.dll mozilla::net::nsHttpTransaction::WriteSegments(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*) netwerk/protocol/http/nsHttpTransaction.cpp:848 3 xul.dll mozilla::net::Http2Session::WriteSegmentsAgain(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*, bool*) netwerk/protocol/http/Http2Session.cpp:2737 4 xul.dll mozilla::net::nsHttpConnection::OnSocketReadable() netwerk/protocol/http/nsHttpConnection.cpp:1899 5 xul.dll mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) netwerk/protocol/http/nsHttpConnection.cpp:2215 6 xul.dll mozilla::net::nsSocketInputStream::OnSocketReady(nsresult) netwerk/base/nsSocketTransport2.cpp:290 these are crashes with crash reason MOZ_RELEASE_ASSERT(seg) that was added in bug 1134372. though that patch was uplifted to 51.0b as well, the crash isn't present there but only in 53.0a1 & 52.0a2 so far.
This will only show up in FF53 and FF52 because MOZ_DIAGNOSTIC_ASSERT is not enabled for release/beta.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Oh, it seems nsSegmentedBuffer::AppendSegment() is fallible. So under OOM conditions it returns nullptr. I can fix this to check for a nullptr return value, but the OOM will likely trigger immediately after from a different allocation.
I think the safest and easiest-to-uplift thing to do here is just properly report the OOM condition.
Attachment #8826696 - Flags: review?(nfroyd)
Note, I can't just restore the code prior to bug 1134372 because that treated this failure as a "would block" condition. I don't think the pipe would necessarily wake back up if we did that for OOM conditions.
Comment on attachment 8826696 [details] [diff] [review] Properly NS_ABORT_OOM() in nsPipe. r=froydnj Review of attachment 8826696 [details] [diff] [review]: ----------------------------------------------------------------- It looks like nsSegmentedBuffer::AppendNewSegment can also return nullptr in non-OOM conditions: http://dxr.mozilla.org/mozilla-central/source/xpcom/io/nsSegmentedBuffer.cpp#30 Though I guess we initialize that with UINT32_MAX, so that condition is effectively a no-op. I guess this is OK; it won't increase the crash rate, it'll just give us more accurate information about the cause of the crash, correct? Though I'm curious why we wouldn't take alternate approaches here: 1) Make AppendNewSegment simply call moz_xmalloc for the last fallible allocation it does, so we could see OOM there. This would have the benefit of avoiding nullptr crashes in a host of places. 2) Make this case return NS_ERROR_OUT_OF_MEMORY or something, and let pipe error reporting do its thing: http://dxr.mozilla.org/mozilla-central/source/xpcom/io/nsPipe3.cpp#1798 3) Something else. r=me assuming my sense of whether this keeps the crash rate the same is correct; if that's not the case, we should discuss, especially with a potential uplift to beta.
Attachment #8826696 - Flags: review?(nfroyd) → review+
Let me see if I can make the NS_ERROR_OUT_OF_MEMORY approach work. I want to write a gtest.
This patch returns NS_ERROR_OUT_OF_MEMORY as suggested.
This patch adds a gtest that forces an OOM to verify how nsPipe handles it. It crashes without the P1 and passes with the P1. It does take a long time to run, though, so lets see how it does on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=50b9b2661cae7a87f4a9e79ace31289c6cda61a8 Because its forcing OOM it can push the test machines into swap usage.
Apparently some of our test runners have more the 256GB of memory+swap: https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1331038&attachment=8826715
Lets see the test runners deal with that! https://treeherder.mozilla.org/#/jobs?repo=try&revision=abe0ee5f59ab3086a4cc206c05c88b548f7625ec
Updated test to try allocating memory up to 1TB. It seems some of our test runners have enough memory+swap to handle 256GB.
So, try shows that the gtest is not reliable on automation. It causes the harness to timeout before the OOM happens. The test runners probably have huge swap partitions making it hard to actually OOM them, but easy to slow them down. I'm going to claim r+ for the NS_ERROR_OUT_OF_MEMORY patch given comment 5 and land it by itself.
Comment on attachment 8826714 [details] [diff] [review] P1 Make nsPipe handle OOM conditions gracefully. r=froydnj Again, claiming r+ based on comment 5. I verified locally that this works. I have a gtest, but the test runners make it too hard to OOM in automation so I can't actually land it with this patch.
Attachment #8826714 - Flags: review?(nfroyd) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/6b716be52be0 Make nsPipe handle OOM conditions gracefully. r=froydnj
Comment on attachment 8826714 [details] [diff] [review] P1 Make nsPipe handle OOM conditions gracefully. r=froydnj Approval Request Comment [Feature/Bug causing the regression]: Bug 1134372 [User impact if declined]: nullptr de-ref crashes when the browser is near OOM conditions [Is this code covered by automated tests?]: nsPipe is heavily tested, but its hard to test this particular path. I wrote a gtest that forces OOM, but it times out in automation because our test runners are configured to make OOM very hard. [Has the fix been verified in Nightly?]: I verified it locally, but it has not landed in m-c yet. [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Minimal risk [Why is the change risky/not risky?]: The patch is just a nullptr check that triggers an existing error path. I verified locally that it works. [String changes made/needed]: None
Comment on attachment 8826714 [details] [diff] [review] P1 Make nsPipe handle OOM conditions gracefully. r=froydnj OOM crash fix, let's take this for aurora and for 51 RC. It should uplift before the beta to release merge on Monday.
You need to log in before you can comment on or make changes to this bug.