crash in mozilla::net::Http2Compressor::ProcessHeader(mozilla::net::nvPair, bool, bool)

VERIFIED FIXED in Firefox 40

Status

()

Core
Networking: HTTP
--
critical
VERIFIED FIXED
3 years ago
2 years ago

People

(Reporter: tracy, Assigned: nwgh)

Tracking

({crash})

Trunk
mozilla41
crash
Points:
---

Firefox Tracking Flags

(firefox40 verified, firefox41 verified)

Details

(Whiteboard: [spdy], crash signature)

Attachments

(1 attachment, 3 obsolete attachments)

(Reporter)

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-22130e02-be81-4b9d-b307-967032150422.
=============================================================
(Assignee)

Comment 1

3 years ago
This looks like a case of gStaticHeaders being null when mHeaderTable.Length is being called. Not sure how that could happen, but I'll dig into it.

Patrick, any ideas off the top of your head?
Flags: needinfo?(mcmanus)
does anything shutdown and startup the socket thread? That might do it.. daniel's link change detection code for example?
Flags: needinfo?(mcmanus)
Whiteboard: [spdy]
(Assignee)

Comment 3

3 years ago
That seems... plausible (and I have a patch to fix it, if that's the real cause here), but shouldn't the shutting down of the socket thread cause h2 sessions to go away (since they no longer have their connections)? A non-gone-away h2 session in conjunction with the socket thread being restarted is the only way I can see the socket thread restart causing this issue.
(Assignee)

Comment 4

3 years ago
(let me clarify what my wondering in comment 3 was) - the real question is, (1) am I wrong, and we *do* expect h2 sessions to stick around (in which case my patch is likely right, or at least not totally incorrect), (2) am I right, but papering over this (by initializing the static table in nsHttpConnectionMgr::Init instead of nvFIFO ctor) seems like a reasonable course to just fix the crash, (3) am I right, and we should find out why h2 sessions are sticking around without landing the paper-over-it patch? (patch incoming for inspection purposes)
(Assignee)

Comment 5

3 years ago
Created attachment 8600164 [details] [diff] [review]
patch

Not requesting review yet, based on the questions posed in comment 4, but here's a try run that I expect to fail on some obscure platform and cause me to rework the patch regardless of the answer to my questions

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6d33a2d0308c
Assignee: nobody → hurley

Updated

3 years ago
status-firefox41: --- → affected
(Assignee)

Comment 7

3 years ago
Created attachment 8609695 [details] [diff] [review]
patch

Yeah, that last one didn't work at all. This one, however, works. It's slightly more invasive than the last one, but overall way safer.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=76314c21d2b6
Attachment #8600164 - Attachment is obsolete: true
Attachment #8609695 - Flags: review?(mcmanus)
Comment on attachment 8609695 [details] [diff] [review]
patch

Review of attachment 8609695 [details] [diff] [review]:
-----------------------------------------------------------------

so this creates a lot of c++ static constructors.. which are frowned upon.

Can we just leave the strings lazily constructed (like they are now), but clean them up in nsNetModule::nsNetShutdown ?

also, separately, are we missing some kind of session cleanup here? This seems like an odd state.

::: netwerk/protocol/http/Http2Compression.cpp
@@ +163,2 @@
>      MOZ_ASSERT(false);
>      NS_WARNING("nvFIFO Table Out of Range");

I think this should still return rather than accessing out of bounds
Attachment #8609695 - Flags: review?(mcmanus)
(Assignee)

Comment 9

3 years ago
Ah yes, the dreaded static ctors *facepalm*. I still think I can do this without lazy constructing, and without static ctors. If that fails, then nsNetShutdown sounds like the right place to clean these up.

We definitely seem to be missing some kind of session cleanup - otherwise, I have no idea how we'd be getting into this state. As mentioned earlier in the bug, it's probably during network change events that something gets missed. Once we get the crash fix landed, I'll file a followup to figure that part out.

Good catch on the missing return - that'll be in the next iteration of the patch :)
(Assignee)

Comment 10

3 years ago
Created attachment 8610822 [details] [diff] [review]
patch

This should avoid the whole static ctor badness from the previous version, while still not having any pointers that could disappear from underneath us :)
Attachment #8609695 - Attachment is obsolete: true
Attachment #8610822 - Flags: review?(mcmanus)
Comment on attachment 8610822 [details] [diff] [review]
patch

Review of attachment 8610822 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/Http2Compression.cpp
@@ -198,2 @@
>    }
> -  return static_cast<nvPair *>(gStaticHeaders->ObjectAt(index));

I think I'm going to object that getting rid of this is just too expensive.. the new code adds a copy for every access of a static name.

I guess I see 3 ways to resolve that
1] convincing me that the static table isn't really used much
2] some clever use of dependent strings here which don't copy
3] just moving the table's dtor.
4] ???

@@ +199,5 @@
>    }
>  }
>  
>  void
>  Http2BaseCompressor::DumpState()

this is really a different bug, but this function should have
if (!LOG_ENABLED()) { return ; }.. so we aren't iterating this table all the time without logging

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +143,5 @@
>      }
>  
> +    nsresult rv = EnsureSocketThreadTarget();
> +
> +    return rv;

this doesn't seem to make a real change
Attachment #8610822 - Flags: review?(mcmanus) → review-

Updated

3 years ago
Blocks: 1167562
(Assignee)

Comment 13

3 years ago
(In reply to PTO - Patrick McManus [:mcmanus] from comment #12)
> Comment on attachment 8610822 [details] [diff] [review]
> patch
> 
> Review of attachment 8610822 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/protocol/http/Http2Compression.cpp
> @@ -198,2 @@
> >    }
> > -  return static_cast<nvPair *>(gStaticHeaders->ObjectAt(index));
> 
> I think I'm going to object that getting rid of this is just too expensive..
> the new code adds a copy for every access of a static name.
> 
> I guess I see 3 ways to resolve that
> 1] convincing me that the static table isn't really used much

FWIW, I'm gut-confident that this is the case (based on all the log reading I've done over the past couple years), but have precisely 0 data to back it up, and I don't think holding up a crash fix to gather that data is worth it, so let's skip this option :)

> 2] some clever use of dependent strings here which don't copy

I'm really tempted to do this one (the whole dynamically-allocated static table thing has always rubbed me the wrong way), but the effort involved is probably not really worth it, hence...

> 3] just moving the table's dtor.

Yeah, let's just do this one.

> @@ +199,5 @@
> >    }
> >  }
> >  
> >  void
> >  Http2BaseCompressor::DumpState()
> 
> this is really a different bug, but this function should have
> if (!LOG_ENABLED()) { return ; }.. so we aren't iterating this table all the
> time without logging

I'm fine sliding that change in with this patch.

> ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
> @@ +143,5 @@
> >      }
> >  
> > +    nsresult rv = EnsureSocketThreadTarget();
> > +
> > +    return rv;
> 
> this doesn't seem to make a real change

Yeah, that was from a previous iteration when I tried something different that didn't work, and just didn't back it out all the way. It's already totally undone in my local tree (I noticed just after I posted the patch to bugzilla).
Attachment #8611285 - Flags: review?(mcmanus) → review+
(Assignee)

Comment 15

3 years ago
And of course inbound is closed as I'm leaving for the day - perhaps this will get checked in before I return to do it myself.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/c09eba908cb9
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox41: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
nick, backport this one?
Flags: needinfo?(hurley)
(Assignee)

Comment 21

3 years ago
Comment on attachment 8611285 [details] [diff] [review]
option 3 patch

Approval Request Comment
[Feature/regressing bug #]: 939318 (or similar - a combo of h2 and network change detection)
[User impact if declined]: crashes when changing networks while speaking h2
[Describe test coverage new/current, TreeHerder]: none
[Risks and why]: very low - just moves de-initialization to a later point (during shutdown)
[String/UUID change made/needed]: none
Flags: needinfo?(hurley)
Attachment #8611285 - Flags: approval-mozilla-aurora?
Comment on attachment 8611285 [details] [diff] [review]
option 3 patch

Fix a crash, low risk, taking it.
Attachment #8611285 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Socorro [1] shows no more crashes in builds after this fix landed. 

[1] - https://crash-stats.mozilla.com/report/list?range_unit=days&range_value=28&signature=mozilla%3A%3Anet%3A%3AHttp2Compressor%3A%3AProcessHeader%28mozilla%3A%3Anet%3A%3AnvPair%2C+bool%2C+bool%29#tab-reports
Status: RESOLVED → VERIFIED
status-firefox40: fixed → verified
status-firefox41: fixed → verified
Duplicate of this bug: 1167562
You need to log in before you can comment on or make changes to this bug.