Closed Bug 1246761 Opened 8 years ago Closed 8 years ago

[h2] Assertion failure: *countWritten > 0 (bad writer), at netwerk/protocol/http/nsHttpTransaction.cpp:770 visiting https://sarahlicity.co.uk

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox47 --- fixed

People

(Reporter: keeler, Assigned: u408661)

Details

(Whiteboard: [necko-active])

Attachments

(4 files)

STR: visit https://sarahlicity.co.uk

Assertion failure: *countWritten > 0 (bad writer), at netwerk/protocol/http/nsHttpTransaction.cpp:770

#0  0x00007fffede57a5c in mozilla::net::nsHttpTransaction::WritePipeSegment(nsIOutputStream*, void*, char*, unsigned int, unsigned int, unsigned int*) (stream=0x7fffbfa822d8, closure=0x7fffbfa77000, buf=0x7fffb81c5000 "HTTP/2.0 200\n\ndate: Mon, 08 Feb 2016 20:25:48 GMT\n\nserver: Apache\n\nx-frame-options: SAMEORIGIN\n\nx-content-type-options: nosniff\n\nx-xss-protection: 1; mode=block\n\ncontent-security-policy: font-src 'sel"..., offset=0, count=32768, countWritten=0x7fffd8bfc050) at /home/keeler/mozilla-central/netwerk/protocol/http/nsHttpTransaction.cpp:770
#1  0x00007fffed93ae74 in nsPipeOutputStream::WriteSegments(nsresult (*)(nsIOutputStream*, void*, char*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*) (this=0x7fffbfa822d8, aReader=0x7fffede57880 <mozilla::net::nsHttpTransaction::WritePipeSegment(nsIOutputStream*, void*, char*, unsigned int, unsigned int, unsigned int*)>, aClosure=0x7fffbfa77000, aCount=32768, aWriteCount=0x7fffd8bfd390)
    at /home/keeler/mozilla-central/xpcom/io/nsPipe3.cpp:1649
#2  0x00007fffede5919a in mozilla::net::nsHttpTransaction::WriteSegments(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*) (this=0x7fffbfa77000, writer=0x7fffc8d461b8, count=32768, countWritten=0x7fffd8bfd390) at /home/keeler/mozilla-central/netwerk/protocol/http/nsHttpTransaction.cpp:817
#3  0x00007fffedd45f48 in mozilla::net::Http2Stream::WriteSegments(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*) (this=0x7fffc8d461b0, writer=0x7fffbf20d820, count=32768, countWritten=0x7fffd8bfd390) at /home/keeler/mozilla-central/netwerk/protocol/http/Http2Stream.cpp:317
#4  0x00007fffedd562ca in mozilla::net::Http2Session::WriteSegments(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*) (this=0x7fffbf20d800, writer=0x7fffc9b5a608, count=32768, countWritten=0x7fffd8bfd390) at /home/keeler/mozilla-central/netwerk/protocol/http/Http2Session.cpp:2678
#5  0x00007fffede1a1ad in mozilla::net::nsHttpConnection::OnSocketReadable() (this=0x7fffc9b5a600)
    at /home/keeler/mozilla-central/netwerk/protocol/http/nsHttpConnection.cpp:1769
#6  0x00007fffede1ae15 in mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) (this=0x7fffc9b5a600, in=0x7fffbfa77a28)
    at /home/keeler/mozilla-central/netwerk/protocol/http/nsHttpConnection.cpp:2082
#7  0x00007fffede1b06f in non-virtual thunk to mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) ()
    at /home/keeler/mozilla-central/netwerk/protocol/http/nsHttpConnection.cpp:2087
#8  0x00007fffedb3ac93 in nsSocketInputStream::OnSocketReady(nsresult) (this=0x7fffbfa77a28, condition=nsresult::NS_OK)
    at /home/keeler/mozilla-central/netwerk/base/nsSocketTransport2.cpp:287
#9  0x00007fffedb42476 in nsSocketTransport::OnSocketReady(PRFileDesc*, short) (this=0x7fffbfa77800, fd=0x7fffbf372bb0, outFlags=1)
    at /home/keeler/mozilla-central/netwerk/base/nsSocketTransport2.cpp:1907
#10 0x00007fffedb4b73d in nsSocketTransportService::DoPollIteration(bool, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>*) (this=0x7fffe06bc500, wait=true, pollDuration=0x7fffd8bfd6d0) at /home/keeler/mozilla-central/netwerk/base/nsSocketTransportService2.cpp:1096
#11 0x00007fffedb4ab00 in nsSocketTransportService::Run() (this=0x7fffe06bc500) at /home/keeler/mozilla-central/netwerk/base/nsSocketTransportService2.cpp:869
#12 0x00007fffedb4bb2c in non-virtual thunk to nsSocketTransportService::Run() ()
    at /home/keeler/mozilla-central/netwerk/base/nsSocketTransportService2.cpp:980
#13 0x00007fffed97acc7 in nsThread::ProcessNextEvent(bool, bool*) (this=0x7fffe060a000, aMayWait=false, aResult=0x7fffd8bfd90e)
    at /home/keeler/mozilla-central/xpcom/threads/nsThread.cpp:1018
#14 0x00007fffed9f76d7 in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7fffe060a000, aMayWait=false)
    at /home/keeler/mozilla-central/xpcom/glue/nsThreadUtils.cpp:297
#15 0x00007fffee089d18 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) (this=0x7fffd9447e40, aDelegate=0x7fffe06a4ec0)
    at /home/keeler/mozilla-central/ipc/glue/MessagePump.cpp:326
#16 0x00007fffedfc3895 in MessageLoop::RunInternal() (this=0x7fffe06a4ec0) at /home/keeler/mozilla-central/ipc/chromium/src/base/message_loop.cc:234
#17 0x00007fffedfc37c5 in MessageLoop::RunHandler() (this=0x7fffe06a4ec0) at /home/keeler/mozilla-central/ipc/chromium/src/base/message_loop.cc:227
#18 0x00007fffedfc379d in MessageLoop::Run() (this=0x7fffe06a4ec0) at /home/keeler/mozilla-central/ipc/chromium/src/base/message_loop.cc:201
#19 0x00007fffed978aff in nsThread::ThreadFunc(void*) (aArg=0x7fffe060a000) at /home/keeler/mozilla-central/xpcom/threads/nsThread.cpp:411
#20 0x00007ffff7fc02ab in _pt_root (arg=0x7fffe066f4a0) at /home/keeler/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:216
#21 0x00007ffff7bc7555 in start_thread (arg=0x7fffd8bfe700) at pthread_create.c:333
#22 0x00007fffe9bfdb9d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
I have a sneaking suspicion (though no confirmation yet) that this is Yet Another Edge Case when handling padded frames (at least, it's consistent with that). I'll take a look this week.
David, is this something that's reproducable every time for you? I have not yet been able to hit the assertion, and I'm seeing no padding on the h2 connections in my http logs.

If you can reproduce, it'd be nice to get an http log from a load that hits the assert.
Flags: needinfo?(dkeeler)
The preloaded HSTS automatic update script was encountering this, but it looks like the site isn't behaving the same way (or another change fixed this). I'll reopen if I encounter it again.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(dkeeler)
Resolution: --- → WORKSFORME
Attached file req.js
I figured out how to reproduce this - if you run the attached file with xpcshell without having attempted to connect to the site in question in a while (maybe 15-30 minutes?) it will fairly reliably hit the assertion. I managed to get a packet capture as well, so I'll attach that too. (I imagine rr would be helpful in debugging this: http://rr-project.org/ )
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
(In reply to David Keeler [:keeler] (use needinfo?) from comment #4)
> Created attachment 8720466 [details]
> req.js
> 
> I figured out how to reproduce this - if you run the attached file with
> xpcshell without having attempted to connect to the site in question in a
> while (maybe 15-30 minutes?) it will fairly reliably hit the assertion. I
> managed to get a packet capture as well, so I'll attach that too. (I imagine
> rr would be helpful in debugging this: http://rr-project.org/ )

Thanks for this, I'll give it a shot and see what comes of it. And I agree about rr... if I could ever get the damn thing to work on my machine! ;)
So this does not appear (so far) to be padding-related; the OnWriteSegment implementation that's apparently returning 0 for countWritten is in Http2Stream (which has no knowledge of padding), not Http2Session (which handles padding). Unfortunately, this appears to be more... problematic. With logging enabled, I never see *any* OnWriteSegment call result in countWritten == 0, and furthermore, when running xpcshell under lldb, the top frame of the crash has no data from the transaction available in it (at least on my mac, haven't tried on my linux machine yet), seeming to indicate that the transaction has gone away somehow. I'm rebuilding rr from latest master on my linux machine right now in the hopes that I'll actually get it to work this time, but those hopes are pretty low.
Assignee: nobody → hurley
Whiteboard: [necko-active]
OK, I want to do some more checking to make sure I'm diagnosing this right (I'm writing this just after coming up with said diagnosis), but what appears to be happening is this: On some runs (not sure why only some), when the server is done sending data on a stream, instead of piggybacking the END_STREAM flag on the last DATA frame with actual data in it, it sends the END_STREAM flag on a separate, empty DATA frame. These are the runs that seem to be causing issues. On runs that work fine, END_STREAM is always included on a DATA frame that has actual data in it.

I'm not seeing any code (on my first go-through) that would handle this case properly, so I think that's the right thing, but like I said, this is the first-run diagnosis, so I want to make sure I'm right before posting a patch.
Argh, nevermind, I think I'm wrong in this diagnosis. It could still be something to do with the empty DATA frame with END_STREAM, but I'm less confident in that now - could be a red herring.
So turns out I was close, but not quite there - we are receiving yet another 0-length DATA frame, but *without* END_STREAM. This is actually the first data frame we get, both in :keeler's attached pcap, and in all of my tests. While silly, the RFC doesn't prohibit an empty DATA frame, so I suppose we should handle it without freaking out :)

Patch incoming once I determine the right place & way to fix the bug.
Attached patch patchSplinter Review
Here we go, test included.
Attachment #8725489 - Flags: review?(mcmanus)
Comment on attachment 8725489 [details] [diff] [review]
patch

Review of attachment 8725489 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm.. I would say request aurora uplift when it lands on mc
Attachment #8725489 - Flags: review?(mcmanus) → review+
(In reply to Patrick McManus [:mcmanus] from comment #13)
> Comment on attachment 8725489 [details] [diff] [review]
> patch
> 
> Review of attachment 8725489 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> lgtm.. I would say request aurora uplift when it lands on mc

I'm not sure it's worth doing that - this is a debug-mode assert that's being hit, how many people do we have running aurora or stabler (is that a word?) in a debug build?
ack.. if you're sure that's the worst thing that happens :)
Pretty confident in that - the transaction (where we're asserting) already handles the case where countWritten is effectively 0 for actual response data (as opposed to response headers) by handling the case where we finish reading all the headers but there's no data in the buffer to be handled yet. That's the case we have here, and looking at the code, I'm confident we can extrapolate proper handling of this case to proper handling of the case where we'd have a 0-length DATA frame somewhere in the middle of other DATA frames (though I've not seen that behavior). We'll keep an eye on things just in case, but for now, I'm going to forego the uplift.
https://hg.mozilla.org/mozilla-central/rev/b71d9316ae22
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: