Closed Bug 1247982 Opened 4 years ago Closed 4 years ago

crash in nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::RemoveElementsAt | mozilla::net::nsHttpRequestHead::~nsHttpRequestHead regressing in Firefox 43

Categories

(Core :: Networking: HTTP, defect, critical)

43 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
e10s - ---
firefox44 --- wontfix
firefox45 + wontfix
firefox46 - wontfix
firefox47 --- wontfix
firefox48 --- fixed
firefox49 --- fixed

People

(Reporter: philipp, Assigned: dragana)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [necko-active])

Crash Data

Attachments

(2 files, 3 obsolete files)

[Tracking Requested - why for this release]:
this is a recent regression & a mid-level crasher

This bug was filed from the Socorro interface and is 
report bp-ab1a8ec1-0a63-4e60-9868-513812160127.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	std::_Atomic_fetch_add_4(unsigned long volatile*, unsigned long, std::memory_order) 	c:/tools/vs2013/vc/include/xatomic.h:1674
1 	xul.dll 	nsTArray_Impl<mozilla::net::nsHttpHeaderArray::nsEntry, nsTArrayInfallibleAllocator>::DestructRange(unsigned int, unsigned int) 	xpcom/glue/nsTArray.h
2 	xul.dll 	nsTArray_Impl<mozilla::net::nsHttpHeaderArray::nsEntry, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned int, unsigned int) 	xpcom/glue/nsTArray.h
3 	xul.dll 	mozilla::net::nsHttpRequestHead::~nsHttpRequestHead() 	netwerk/protocol/http/nsHttpRequestHead.cpp
4 	xul.dll 	mozilla::net::HttpBaseChannel::~HttpBaseChannel() 	netwerk/protocol/http/HttpBaseChannel.cpp
5 	xul.dll 	mozilla::net::nsHttpChannel::`scalar deleting destructor'(unsigned int) 	
6 	xul.dll 	nsInputStreamChannel::Release() 	netwerk/base/nsInputStreamChannel.cpp
7 	xul.dll 	NS_ProxyRelease(nsIEventTarget*, nsISupports*, bool) 	xpcom/glue/nsProxyRelease.cpp
8 	xul.dll 	nsTransportEventSinkProxy::~nsTransportEventSinkProxy() 	netwerk/base/nsTransportUtils.cpp
9 	xul.dll 	nsTransportEventSinkProxy::`scalar deleting destructor'(unsigned int) 	
10 	xul.dll 	nsTransportEventSinkProxy::Release() 	netwerk/base/nsTransportUtils.cpp
11 	xul.dll 	nsCOMPtr_base::~nsCOMPtr_base() 	xpcom/glue/nsCOMPtr.h
12 	xul.dll 	mozilla::net::nsHttpTransaction::~nsHttpTransaction() 	netwerk/protocol/http/nsHttpTransaction.cpp
13 	xul.dll 	mozilla::net::nsHttpTransaction::`scalar deleting destructor'(unsigned int) 	
14 	xul.dll 	mozilla::net::DeleteHttpTransaction::Run() 	netwerk/protocol/http/nsHttpTransaction.cpp
15 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
16 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp
17 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp
18 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
19 	xul.dll 	nsThreadManager::nsThreadManager() 	xpcom/threads/nsThreadManager.h
20 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp
21 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp
22 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp
23 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp
24 	xul.dll 	XREMain::XRE_main(int, char** const, nsXREAppData const*) 	toolkit/xre/nsAppRunner.cpp
25 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp

this windows crash signature and regressing since firefox 43, and there are no obvious correlations to any addons/modules causing it.
on release it is currently the #28 crasher making up 0.40% of all crashes.
Component: General → Networking: HTTP
Daniel, can you help here? 
This is top crash #23 (windows only)
thanks
(I don't see a trace of a single crash in 47)
Flags: needinfo?(daniel)
Really tricky thing, this. Thee chain nsHttpRequestHead, HttpBaseChannel and nsInputStreamChannel have all remained unmodified for a while even before this so I'd say this crash is more likely to be caused by a change in behavior in some foundation code. 

Is there a way for us to do some range checks, as in can we figure out when this problem first showed up?

It is so very few version 46 crashes too so I fear the lack of 47 crashes could just be just bad/good luck so far.
Flags: needinfo?(daniel)
Thanks Daniel!

Kairo, could you help Daniel with the range checks? Danke!
Flags: needinfo?(kairo)
(In reply to Daniel Stenberg [:bagder] from comment #2)
> Is there a way for us to do some range checks, as in can we figure out when
> this problem first showed up?

As the crash signature happens back to at least Firefox 39 and mid-last-year, I don't know what to look for there.
Flags: needinfo?(kairo)
fixing this would move the needle on crashes
Assignee: nobody → daniel
Whiteboard: [necko-active]
See Also: → 1250816
Duplicate of this bug: 1250816
I think I might have an idea for this, but I'm not sure if would work yet.
As I mentioned in bug 1250816 comment 4, it seems the crash occurs when because mHeaders in nsHttpRequestHead is corrupted.
So we create a class to act as a memory guard, on which we set PROT_NONE, which we place around mHeaders or mRequestHead.
This way, whenever the memory around mHeaders is corrupted we get a nice stack trace, so we can find the cause.
Then we need to send the build to all the reporters who mentioned this happens often - I assume we can do that.
Added signature from dupe. No ideas personally but know it is one of the top 5 signatures pushing up e10s crashes.
Starting with std::_Atomic_fetch_add_4 reminds me of semi recent fix https://hg.mozilla.org/mozilla-central/rev/3b0dafa67477 (Bug most definitely not related.)

Control group from e10s b45.2 experiment [1]. the experiment failed to filter out accessibility, about 10% more users were added extra that e10s group didn't contain.
18/31 are with accessibility. (~nsHttpRequestHead signature)

Experiment group [2]. No accessibility.
20 (~nsHttpRequestHead signature)
110 (OnStartRequest signature)

[1] https://crash-stats.mozilla.com/search/?build_id=20160215141016&process_type=browser&process_type=content&ActiveExperiment=%3De10s-beta45-withoutaddons%40experiments.mozilla.org&date=%3E%3D2016-02-16&date=%3C2016-02-23&ActiveExperimentBranch=%3Dcontrol-no-addons&signature=~nsTArray_Impl%3CT%3E%3A%3ADestructRange&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

[2] https://crash-stats.mozilla.com/search/?build_id=20160215141016&process_type=browser&process_type=content&ActiveExperiment=%3De10s-beta45-withoutaddons%40experiments.mozilla.org&date=%3E%3D2016-02-16&date=%3C2016-02-23&ActiveExperimentBranch=%3Dexperiment-no-addons&signature=~nsTArray_Impl%3CT%3E%3A%3ADestructRange&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Crash Signature: [@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::RemoveElementsAt | mozilla::net::nsHttpRequestHead::~nsHttpRequestHead] → [@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::RemoveElementsAt | mozilla::net::nsHttpRequestHead::~nsHttpRequestHead] [@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::ReplaceElementsAt<T> | mozilla::net::HttpChannelChild::OnStartRequest]
tracking-e10s: --- → ?
Just to note, it seems there are a bunch of crashes with a similar signature (nsTArray_Impl<T>::RemoveElementsAt) that are not in necko. There is a high probability that the problem is somewhere in nsTArray.

https://crash-stats.mozilla.com/search/?signature=~nsTArray_Impl%3CT%3E%3A%3ADestructRange&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
(In reply to Valentin Gosu [:valentin] from comment #9)
> Just to note, it seems there are a bunch of crashes with a similar signature
> (nsTArray_Impl<T>::RemoveElementsAt) that are not in necko. There is a high
> probability that the problem is somewhere in nsTArray.
> 
> https://crash-stats.mozilla.com/search/
> ?signature=~nsTArray_Impl%3CT%3E%3A%3ADestructRange&_facets=signature&_column
> s=date&_columns=signature&_columns=product&_columns=version&_columns=build_id
> &_columns=platform#facet-signature

Hm, three out of the top four crashes in that list are Necko crashes:

https://crash-stats.mozilla.com/signature/?signature=nsTArray_Impl%3CT%3E%3A%3ADestructRange+|+nsTArray_Impl%3CT%3E%3A%3ARemoveElementsAt+|+mozilla%3A%3Anet%3A%3AnsHttpResponseHead%3A%3A~nsHttpResponseHead&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#reports
https://crash-stats.mozilla.com/signature/?signature=nsTArray_Impl%3CT%3E%3A%3ADestructRange+|+nsTArray_Impl%3CT%3E%3A%3AReplaceElementsAt%3CT%3E+|+mozilla%3A%3Anet%3A%3AHttpChannelChild%3A%3AOnStartRequest&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#reports
https://crash-stats.mozilla.com/signature/?signature=nsTArray_Impl%3CT%3E%3A%3ADestructRange+|+nsTArray_Impl%3CT%3E%3A%3ARemoveElementsAt+|+mozilla%3A%3Anet%3A%3AnsHttpRequestHead%3A%3A~nsHttpRequestHead&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#reports

Virtually all of the crashes are at address (size_t)-8, which indicates we have a null pointer somewhere.

Let's look at one of those crashes:

https://crash-stats.mozilla.com/report/index/5f286bee-a18a-4bb3-9a08-a6ab32160229

Downloading the dump file and disassembling the DestructRange function:

// This block is setting up pointers for the current element and the next element.
// 14h == 0x14 == 20 is sizeof(nsHttpHeaderArray::nsEntry).
// esi is the current element, ebx the pointer just beyond the last element.
xul!nsTArray_Impl<mozilla::net::nsHttpHeaderArray::nsEntry,nsTArrayInfallibleAllocator>::DestructRange:
 2006 6bce359e 6b54240414      imul    edx,dword ptr [esp+4],14h
 2007 6bce35a3 53              push    ebx
 2008 6bce35a4 6b5c240c14      imul    ebx,dword ptr [esp+0Ch],14h
 2008 6bce35a9 56              push    esi
 2008 6bce35aa 8b31            mov     esi,dword ptr [ecx]
 2008 6bce35ac 83c208          add     edx,8
 2008 6bce35af 03f2            add     esi,edx
 2008 6bce35b1 03de            add     ebx,esi
 2009 6bce35b3 3bf3            cmp     esi,ebx
 2009 6bce35b5 742d            je      (6bce35e4)  Branch

// Save ebp because we're going to use it in the code below.  Note that
// we're only going to save it if there's actually elements to destroy,
// which is a nice demonstration of shrink-wrapping by the compiler.
DestructRange+0x19:
 2009 6bce35b7 55              push    ebp

// This block and subsequent bits is is nsSubstring.cpp:ReleaseData, inlined.
// The bare 4 is nsSubstring::F_SHARED.  The layout of nsTSubstring is:
//
// char* mData (offset 0)
// unsigned mLength (offset 4)
// unsigned mFlags (offset 8)
//
// But esi isn't pointing an an nsTSubstring; it's pointing at an nsEntry,
// and the string we're interested in is at offset 4 inside an nsEntry structure.
// Hence, given an nsEntry pointer, |mData| of |nsEntry::value| would be at
// offset 4, |mLength| at offset 8, and so forth.
//
// The |test| instruction is checking |mFlags|; the |mov| instruction is
// loading the |mData| pointer.
DestructRange+0x1a:
 2010 6bce35b8 f6460c04        test    byte ptr [esi+0Ch],4
 2010 6bce35bc 8b4604          mov     eax,dword ptr [esi+4]
 2010 6bce35bf 7428            je      (6bce35e9)  Branch

// This is the F_SHARED case of ReleaseData.  We need to do an atomic
// decrement of nsStringBuffer's reference count.  So we convert the |mData|
// pointer into an nsStringBuffer (nsStringBuffer::FromData); this is the
// |lea| instruction below.  We push (in program order) the memory ordering,
// the amount we're going to add (-1), and the pointer to the refcount, which
// is identical to the pointer to the string buffer itself.
DestructRange+0x23
 2010 6bce35c1 6a05            push    5
 2010 6bce35c3 8d68f8          lea     ebp,[eax-8]
 2010 6bce35c6 6aff            push    0FFFFFFFFh
 2010 6bce35c8 55              push    ebp
 2010 6bce35c9 e8c884d7ff      call    xul!std::_Atomic_fetch_add_4 (6ba5ba96)
 2010 6bce35ce 83c40c          add     esp,0Ch
 2010 6bce35d1 48              dec     eax
 2010 6bce35d2 7508            jne     (6bce35dc)  Branch

There's other bits of that function that I'm not going to annotate, because we have enough to see what's going on: some |value| member of an nsEntry had a nullptr |mData| field, which got translated into an |(size_t)-8| address to pass to _Atomic_fetch_add_4.

How did mData get to be nullptr?  That is not at all clear to me.  We have a couple cases in the string code where mData starts out as nullptr, but those appear to be transient conditions.  We guard against assigning nullptr to an nTSubstring in nsTSubstring::Assign...Is it possible that somehow we have a null entry in the array?  I don't see any way that could happen from the nsHttpHeaderArray code, but perhaps I'm not looking hard enough (nsHttpHeaderArray::SetEmptyHeader *does* initialize the new entry's |value|, fwiw).

I don't see how this is an nsTArray problem, but perhaps I'm not being imaginative enough. ;)
Flags: needinfo?(valentin.gosu)
That means the initial assumption of memory corruption is correct, or somehow we end up nulling out the pointer backing nsEntry::value via one of the string APIs - which is less likely.
Flags: needinfo?(valentin.gosu)
(In reply to Valentin Gosu [:valentin] from comment #11)
> That means the initial assumption of memory corruption is correct, or
> somehow we end up nulling out the pointer backing nsEntry::value via one of
> the string APIs - which is less likely.

Memory corruption where every instance is writing zero in a single field is pretty convenient (note that it's not the entire string getting zero'd out, or |mFlags| would be zero too).  Are these header arrays used on multiple threads?
It seems that the nsHttpHeaderArray methods are called on multiple threads.
[Tracking Requested - why for this release]:
Too late for 45 but we have enough crashes on 46 to discuss about a tracking.
(In reply to Valentin Gosu [:valentin] from comment #13)
> It seems that the nsHttpHeaderArray methods are called on multiple threads.

Is that with appropriate locking?  I admit that I haven't seen nsHttpHeaderArray come up in TSan races, but maybe circumstances are different out in the wild...
This doesn't appear to be e10s specific.
IIRC our plan here it to try adding locking to these calls and see if the issue goes away.  Is that right Daniel?
Flags: needinfo?(daniel)
I was hoping Valentin would clarify after Nathan's question. What (unlocked?) accesses from multiple threads were you talking about?
Flags: needinfo?(daniel) → needinfo?(valentin.gosu)
In nsHttpHeaderArray I see the methods getting called from different threads, and there's no locking as far as I can tell.
From what I understand, the crash is due to the pointer backing nsEntry::value is null.
One way I see this happening due to threading would be if one thread calls mHeaders.AppendElement(); and the other is iterating through the array before the constructor for that element gets called to initialize the objects. I can't verify if this correct though, and it would be a bit odd if we were still appending to the array while another array was destroying it.
Flags: needinfo?(valentin.gosu)
I don't see any crashes on 46 for these signatures or for Nathan's search. But I could be missing something here. KaiRo can you tell if this affects 46?  Thanks!
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #20)
> KaiRo can you tell if this affects 46?  Thanks!

According to https://mozilla.github.io/bug-signatures-status/#/bug/1247982 it was seen in 46 Dev Edition but not (yet) in 46.0b1
Untrack e10s-crashes.
No longer blocks: e10s-crashes
nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::ReplaceElementsAt<T> | mozilla::net::HttpChannelChild::OnStartRequest
has probably been replaces by
mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::net::PHttpChannelChild::FatalError | mozilla::net::PHttpChannelChild::OnMessageReceived in 46 beta.
(In reply to Jonathan Howard from comment #23)
> nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::ReplaceElementsAt<T> |
> mozilla::net::HttpChannelChild::OnStartRequest
> has probably been replaces by
> mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError |
> mozilla::net::PHttpChannelChild::FatalError |
> mozilla::net::PHttpChannelChild::OnMessageReceived in 46 beta.

The signature looks very different.
Maybe the wrong one, (that was only in 46b1 and gone from b2.)
ReleaseData signature possible more likely bug 1145613 comment 16
I'm not seeing this crash for 46, so I don't think we need to track it.
I have look a bit into this:

We have a function that returns a raw pointer to the RequestHeaders,  a bit scary. But it is not done anything wrong with it. It is only use to access function like GetHeader, GetMethod... so it is safe.

I little bit scary is that is used over nsHttpTransaction which holds a raw pointer to nsHttpRequestHeader and
nsHttpTransaction changes request headers too.

e.g:
https://crash-stats.mozilla.com/report/index/ab1a8ec1-0a63-4e60-9868-513812160127
in this crash nsHttpTransaction is being destroyed so the only possibility is that nsHttpTransaction is used in the same time as it is being destroyed.

There are other crashes where:
https://crash-stats.mozilla.com/report/index/b3a152f8-bb8a-4533-b7eb-c53892160405
here only nsHttpChannel is being destroyed.

Anyway nsHttpRequestHead is not kept as a pointer in nsHttpChannel so it will be gone when channel is gone:
http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpBaseChannel.h#375

So we are using nsHttpTransction when it is being freed and/or we still have pointer to nsHttpRequestHead in nsHttpTransaction when the channel is being destroyed. Putting a lock will not solve the problem.

It is possible that I have missed something in this analysis :)
Something to add:
One more possibility -  at some point, we add elements to the array at the same time from two different threads and that is why array is corrupted. (I am not sure if this is possible)
this is a weird structure - it is one of the few pieces of data shared between the socket and main threads. It has never had locking and it has never had any kind of memory barrier to ensure consistent views. Its model is really point in time - the main thread owns it before setting up the http transaction, then the http transaction owns it until onstartrequest() is called and then the main thread owns it again.

its been sketchy since the days before phoenix afaict. and its certainly prone to regression.

Dragana, it would be ok to lock all these methods. It would be interesting to implement a wrapper around AutoLock for it that contains an Atomic Bool with the state of the lock and DIAGNOSTIC_ASSERT (?) it to be free just before obtaining it. It would miss finding the offender in one race condition, but it wouldn't have a false positive problem.
This may be the same as e10s crash bug 1145613, which was our #4 top crasher in content for our beta 46 experiment. That crash always seems to happen near the startup of the content process. There's some additional debug info and crashstats links in that bug, although you can ignore everything prior to bug 1145613, comment 15.
Daniel, do you want to continue working on this, it is assigned to you? or should I take it?

We could add also a check if the right thread is accessing it at specific time - from comment 29 "the main thread owns it before setting up the http transaction, then the http transaction owns it until onstartrequest() is called and then the main thread owns it again"

e10s cases - I m not sure it is the same issue.
Flags: needinfo?(daniel)
Please have a go at it, Dragana. I've got stuck elsewhere.
Assignee: daniel → dd.mozilla
Status: NEW → ASSIGNED
Flags: needinfo?(daniel)
Dragana, curious if I can get an update on this bug. Have you had a chance to look at it? If not, when do you think you'll be able to?
Flags: needinfo?(dd.mozilla)
(In reply to Jim Mathies [:jimm] from comment #34)
> Dragana, curious if I can get an update on this bug. Have you had a chance
> to look at it? If not, when do you think you'll be able to?

I am working on it. I decided not to just add a lock, but to figure out what went wrong and that takes longer since nsHttpRequestHead is used from 4 different threads :)
Flags: needinfo?(dd.mozilla)
I tried to make control from which thread headers are access, but it got complicated so going for an easier way.

I still think that nsHttpTransaction::Restart() is the problem, but I do not have a prove. Just before Restart is called,  mActivityDistributor->ObserveActivity is called.

So I added lock on header array and a check hopefully it will be triggered.
To add a lock on nsHttpRequestHead will need rewrite a big part of nsHttpRequestHead so lets try without that.
mWriteInProgress is check out of the lock, so it is possible the assert will not be triggered always when it should be, but this is just for a diagnostic so it is enough to be triggered once.
The lock is to be sure of proper writing into the array and hopefully no more crashes.
Attachment #8745263 - Flags: review?(mcmanus)
Attachment #8745263 - Attachment is obsolete: true
Attachment #8745263 - Flags: review?(mcmanus)
Attachment #8745264 - Flags: review?(mcmanus)
Comment on attachment 8745264 [details] [diff] [review]
bug_1247982_lockheaderArray.patch

Review of attachment 8745264 [details] [diff] [review]:
-----------------------------------------------------------------

I think we ought to lock all methods of this class based on the data we have.. a read with a write is also bad news and it isn't meant to ever be used simultaneously on different threads so we won't be serializing multiple readers
Attachment #8745264 - Flags: review?(mcmanus) → feedback+
(In reply to Patrick McManus [:mcmanus] from comment #40)
> Comment on attachment 8745264 [details] [diff] [review]
> bug_1247982_lockheaderArray.patch
> 
> Review of attachment 8745264 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I think we ought to lock all methods of this class based on the data we
> have.. a read with a write is also bad news and it isn't meant to ever be
> used simultaneously on different threads so we won't be serializing multiple
> readers

I tried to make a control from which thread mHttpRequestHead is being accessed:
It starts with main thread until asyncOpen is called, then it can be cache2I/O, cacheI/O or mainthread but only a single function (actually 2, but one is onle cache(2)I/O) is called so it is possible to control it. The cache part can be skipped.
Then it is socketThread and this is a bit problematic because we call mActivityDistributor->ObserveActivity couple of times and there mHttpRequestHead is access from the main thread. It is fine to assume socketThread until headers are written out but nsHttpTransaction::Restart is call after that and it changes header array. So if I put MOZ_DIAGNOSTIC_ASSERT(!mHeaders->mWriteInProgress); to all functions we will crash very often.

I will rewrite part where nsHttpTransaction::Restart is called. And try to make a patch that will return error if the thread is not the right one, e.g. calling GetHeader from js after asyncopen is called. (I think we do not have guards like that)
Attached patch bug_1247982.patch (obsolete) — Splinter Review
One note:
there is a bit not nice way how I am passing parameters to SendOnStartRequest. I do not want to do an extra copying of headers. To avoid copying there is an option to pass nsHttpRequetHead as a parameter and to lock in
static void ParamTraits<mozilla::net::nsHttpRequestHead>::Write(Message* aMsg, const mozilla::net::nsHttpRequestHead& aParam)

but here headers are past as a const so only option is a hack or rewriting ParamTraits. (I am not sure it is worth it.)
Attachment #8745264 - Attachment is obsolete: true
Attachment #8747034 - Flags: review?(mcmanus)
Comment on attachment 8747034 [details] [diff] [review]
bug_1247982.patch

Review of attachment 8747034 [details] [diff] [review]:
-----------------------------------------------------------------

this is tremendous. Thank you. We might need to rename you bulldog.

the review is pretty much nits.. and I have a general feeling we can get some of those consts back.. but if not we can live with it.

::: netwerk/protocol/http/nsHttpRequestHead.cpp
@@ +30,5 @@
>  {
>      MOZ_COUNT_DTOR(nsHttpRequestHead);
>  }
>  
> +// Don't use this function. It is only used by HttpChannelarent to avoid

typo (Parent)

@@ +74,5 @@
> +    return mHeaders.Count();
> +}
> +
> +nsresult
> +nsHttpRequestHead::VisitHeaders(nsIHttpHeaderVisitor *visitor,

I think this is worth documenting in the .h file that it lets you iterate all the headers atomically under one lock

::: netwerk/protocol/http/nsHttpRequestHead.h
@@ +46,5 @@
> +    void Method(nsACString &aMethod);
> +    nsHttpVersion Version();
> +    void RequestURI(nsACString &RequestURI);
> +    void Path(nsACString &aPath);
> +    void SetHTTPS(bool val); 

tailing whitespace

@@ +56,3 @@
>  
> +    nsresult SetHeader(nsHttpAtom h, const nsACString &v, bool m=false);
> +    nsresult SetHeader(nsHttpAtom h, const nsACString &v, bool m, 

trailing whitespace

@@ +57,5 @@
> +    nsresult SetHeader(nsHttpAtom h, const nsACString &v, bool m=false);
> +    nsresult SetHeader(nsHttpAtom h, const nsACString &v, bool m, 
> +                       nsHttpHeaderArray::HeaderVariety variety);
> +    nsresult SetEmptyHeader(nsHttpAtom h);
> +    nsresult GetHeader(nsHttpAtom h, nsACString &v);

this lost a const.. on purpose? findheadervalue() and HasHeaderValue() and IsSafeMethod() too..

@@ +93,3 @@
>  
> +    bool IsOptions();
> +    bool IsConnect();

so these is*() things that just call EqualsMethod can stay in the .h file, right?
Attachment #8747034 - Flags: review?(mcmanus) → review+
(In reply to Patrick McManus [:mcmanus] from comment #44)
> Comment on attachment 8747034 [details] [diff] [review]
> bug_1247982.patch
> 
> Review of attachment 8747034 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +57,5 @@
> > +    nsresult SetHeader(nsHttpAtom h, const nsACString &v, bool m=false);
> > +    nsresult SetHeader(nsHttpAtom h, const nsACString &v, bool m, 
> > +                       nsHttpHeaderArray::HeaderVariety variety);
> > +    nsresult SetEmptyHeader(nsHttpAtom h);
> > +    nsresult GetHeader(nsHttpAtom h, nsACString &v);
> 
> this lost a const.. on purpose? findheadervalue() and HasHeaderValue() and
> IsSafeMethod() too..
> 

We want to lock this functions too, in const function we cannot call mLock.Lock();

> @@ +93,3 @@
> >  
> > +    bool IsOptions();
> > +    bool IsConnect();
> 
> so these is*() things that just call EqualsMethod can stay in the .h file,
> right?

I moved everything at first, and forgot to move some back :)
typos fix.
Attachment #8747034 - Attachment is obsolete: true
Attachment #8748068 - Flags: review+
Keywords: checkin-needed
Depends on: 1269738
https://hg.mozilla.org/mozilla-central/rev/2f32c58b741b
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Duplicate of this bug: 609895
This is marked as a regression in 47; can we uplift to beta?
Flags: needinfo?(rkothari)
andrew, I don't agree with the regression tag. This is a day one core race condition problem and the crash signatures have probably shifted along the way depending on what else was going on. It was originally filed with 43, but is probably an issue longer than that.

it's also a fairly complicated patch. If dragana felt confident in an uplift to 48 I would support that, but I would not support 47 at this point.
(In reply to Andrew Overholt [:overholt] from comment #50)
> This is marked as a regression in 47; can we uplift to beta?

Hi Andrew, based on Patrick's assessment, this may be too risk to uplift at this stage, RC1 gtb today. We do want to fix recent regressions but we may have lost the runway to do that. 

I also looked at the crash volume on this one and it doesn't seem high enough to warrant uplifting at this stage. What do you think?
Flags: needinfo?(rkothari)
This is not a new regression and Patrick suggested we avoid uplifting to 47 as the patch is fairly risky to take this late in 47 cycle (RC week). This is now a wontfix for Fx47.
(In reply to Patrick McManus [:mcmanus] from comment #51)
> andrew, I don't agree with the regression tag. This is a day one core race
> condition problem and the crash signatures have probably shifted along the
> way depending on what else was going on. It was originally filed with 43,
> but is probably an issue longer than that.
> 
> it's also a fairly complicated patch. If dragana felt confident in an uplift
> to 48 I would support that, but I would not support 47 at this point.

Sorry for a delay, 47 is really risky (and not in question any more).

About 48, we could uplift it. It does touch a lot of header handling code, but I think it is fine to uplift.
We will need to uplift also bug 1269738 (a very small patch) and bug 1274509 (this bug changes an idl - setting header during VisitRequestHeader was an unsafe operation, really hope no one did it, now it is going to return an error)
Uplifting bug 507571 as the same time as this is probably worth doing IMO. (I'm assuming without verifying it has no dependencies.)
Dragana, can you request and coordinate the necessary 48 uplifts? It would be excellent if we got it done while that was still dev-channel
Flags: needinfo?(dd.mozilla)
Comment on attachment 8748068 [details] [diff] [review]
bug_1247982.patch

Approval Request Comment

This is approval request for 48 (in case it changes channels)

[Feature/regressing bug #]: A long existing bug
[User impact if declined]:Crash, but not very high volume.
[Describe test coverage new/current, TreeHerder]: This code is heavily used, but this patch touches almost every aspect of request header handling (it does not change header parsing which is good). It is hard to test all corner cases
[Risks and why]: It is hard to test all corner cases. 
[String/UUID change made/needed]:


We will need to uplift also bug 1269738 (a very small patch) and bug 1274509 (it is a low risk, the bug changes an idl - setting header during VisitRequestHeader was an unsafe operation, really hope no one did it, now it is going to return an error).
Order of uplift:
this patch,
bug 1269738
bug 1274509
Flags: needinfo?(dd.mozilla)
Attachment #8748068 - Flags: approval-mozilla-aurora?
Comment on attachment 8748068 [details] [diff] [review]
bug_1247982.patch

needed a rebase :)
Attachment #8748068 - Flags: approval-mozilla-aurora?
Approval Request Comment

This is approval request for 48 (in case it changes channels)

[Feature/regressing bug #]: A long existing bug
[User impact if declined]:Crash, but not very high volume.
[Describe test coverage new/current, TreeHerder]: This code is heavily used, but this patch touches almost every aspect of request header handling (it does not change header parsing which is good). It is hard to test all corner cases
[Risks and why]: It is hard to test all corner cases. 
[String/UUID change made/needed]:


We will need to uplift also bug 1269738 (a very small patch) and bug 1274509 (it is a low risk, the bug changes an idl - setting header during VisitRequestHeader was an unsafe operation, really hope no one did it, now it is going to return an error).
Order of uplift:
this patch,
bug 1269738
bug 1274509
Attachment #8759933 - Flags: approval-mozilla-aurora?
Comment on attachment 8759933 [details] [diff] [review]
bug_1247982_aurora.patch

Well, it is a very patch just before merge day, not taking it right now at least.
Comment on attachment 8759933 [details] [diff] [review]
bug_1247982_aurora.patch

Switching the approval flag so that it's for 48
Attachment #8759933 - Flags: approval-mozilla-aurora? → approval-mozilla-beta?
Crash Signature: [@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::RemoveElementsAt | mozilla::net::nsHttpRequestHead::~nsHttpRequestHead] [@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::ReplaceElementsAt<T> | mozilla::net::HttpChannelChild::OnStartRequest] → [@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::RemoveElementsAt | mozilla::net::nsHttpRequestHead::~nsHttpRequestHead] [@ nsTArray_Impl<T>::DestructRange | nsTArray_Impl<T>::ReplaceElementsAt<T> | mozilla::net::HttpChannelChild::OnStartRequest] …
Comment on attachment 8759933 [details] [diff] [review]
bug_1247982_aurora.patch

Has been in m-c for a month, also now on aurora, fix an interesting volume of crashes on release, taking it.
Attachment #8759933 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.