Closed Bug 1293501 Opened 4 years ago Closed 2 years ago
Add debug annotation/code for the crash in Release
+++ This bug was initially created as a clone of Bug #1145613 +++ I have checked the code which set nsTSubstring_CharT::mFlags with F_SHRED, but I don't see any of them could allow null mData. Based on crash-stats, nsHttpHeaderArray has the highest chance to meet the crash. And from the minidumps, it's crashed while destructing the nsCString |mValue| of the first nsEntry in nsHttpHeaderArray::mHeaders, so I'd like to see memory content the array's header also the first two elements (which are not existed in the minidump). This bug may also be used for adding some more debug code.
:mcmanus is taking PTO til 08-15, could help to review this?
Attachment #8779142 - Flags: review?(jduell.mcbugs)
Assignee: nobody → janus926
Comment on attachment 8779142 [details] [diff] [review] part1 v1: add crash annotatios for the nsHttpHeaderArray Review of attachment 8779142 [details] [diff] [review]: ----------------------------------------------------------------- Seems ok to me!
Attachment #8779142 - Flags: review?(jduell.mcbugs) → review+
Updated reviewer. Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e647162c7cb4
Attachment #8779142 - Attachment is obsolete: true
This is for debug purpose, probably more patches to come. But for now there's only one.
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/16abccc347ef Add debug annotations to track down the crash of bug 1145613. r=jduell
Attachment #8780951 - Attachment is obsolete: true
This time I'd like to know when |mData| is cleared to null, so I put a check in ReentrantMonitorAutoEnter, which will check before/after manipulating nsHttpHeaderArray.
Attachment #8785146 - Flags: review?(mcmanus) → review?(dd.mozilla)
Comment on attachment 8785146 [details] Bug 1293501 - Add some more crash annotations to dianose bug 1145613. https://reviewboard.mozilla.org/r/74448/#review72576 Thanks for doing this.
Attachment #8785146 - Flags: review?(dd.mozilla) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/5661cf2d2b85 Add some more crash annotations to dianose bug 1145613. r=mcmanus
I'll see if I can use debug registers to add a watchpoint on |mData| like https://hg.mozilla.org/try/rev/db9d9d4e2dab9cffdfec027316381a0fb775c9f7.
Based on bug 1247982 comment 41, the debug registers would need to apply on multiple threads.
(In reply to Ting-Yu Chou [:ting] from comment #14) > Based on bug 1247982 comment 41, the debug registers would need to apply on > multiple threads. Seems not true in content process, I saw only main thread that is accessing nsHttpRequestHead.
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/6e6697a3d251 Remove debug code for not propagating to Aurora. r=me
It'd become https://crash-stats.mozilla.com/search/?reason=~EXCEPTION_ACCESS_VIOLATION_WRITE&product=Firefox if it's captured by this code.
Comment on attachment 8798718 [details] [diff] [review] v4: use virtualprotect to track down possible corruption This protects only the |mData| of the |value| of nsEntry, I should protect also the memory after |header|.
Add padding between header and value of nsEntry, set the page (starts from the array header to the first nsEntry value's |mData|) read only when ReentrantMonitorAutoEnter destructs.
Attachment #8798764 - Flags: review?(dd.mozilla)
Attachment #8798764 - Flags: review?(dd.mozilla) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/2eebd44ff2e9 Use VirtualProtect to track down a possible memory corruption. r=dragana
I am using this search to monitor the possible culprit: https://crash-stats.mozilla.com/search/?reason=~EXCEPTION_ACCESS_VIOLATION_WRITE&build_id=>%3D20161012030211&product=Firefox&version=52.0a1&platform=Windows
A more accurate filter: https://crash-stats.mozilla.com/search/?reason=~EXCEPTION_ACCESS_VIOLATION_WRITE&build_id=%3E%3D20161012030211&address=%24c&product=Firefox&version=52.0a1&platform=Windows&date=%3E%3D2016-10-07T02%3A05%3A13.000Z&date=%3C2016-11-07T02%3A05%3A13.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=address#crash-reports
The crash still happens with the VirtualProtect, see bp-5e073847-8799-4ae1-b6b2-d42f22161105.
(In reply to Ting-Yu Chou [:ting] from comment #26) > The crash still happens with the VirtualProtect, see > bp-5e073847-8799-4ae1-b6b2-d42f22161105. This one is odd, open the dump with vs2015 and it has following in ReleaseData(): &iter->header = 0x0e03300c &iter->padding = 0x0e033010 &iter->value = 0x0e034000 The array content does not start from 0x...8 (sizeof nsTArrayHeader is 8) but 0x0e03300c, which iter->value.mData is outside of the readonly page.
I was wrong for comment 27, the address is not ended with 8 (0x0e0300c) because it's the 2nd nsEntry in the array. Note the patch in comment 23 protects only the first page of the nsHttpRequestHead.mHeaders: |nsTArrayHeader|nsEntry.header|padding|nsEntry.value.mData|
Hi Ting, now that the debugging code is on Aurora too, are we seeing anything useful in newer crash reports?
No, there're only 5 EXCEPTION_ACCESS_VIOLATION_WRITE crashes for 52.0a2 which have the address end with 'c' in the past 2 weeks, and none of them are '0c' or 'fc'. I'd like to wait for few more days.
Between Nov 12, 2016 and Dec 12, 2016 , there're 6 EXCEPTION_ACCESS_VIOLATION_WRITE crashes have address ends with 0xfc (the padding shifted offset of the first nsEntry's value.mData in the array): bp-574f8db7-fa10-4ab3-8d04-fb7532161212 F_767234858____________________________ bp-2ca9629f-c698-49b5-8646-3bdd92161206 ImplCycleCollectionTraverse<T> bp-a5cbad60-913d-4396-87a1-ece372161130 js::frontend::MatchOrInsertSemicolonHelper bp-797295af-ff22-4e94-8c77-7ad852161128 kernel32.dll@0x260be bp-a7825c72-f51f-4c48-92fe-f02282161125 js::irregexp::ExecuteCode<T> bp-9189e0d4-2334-4735-97e6-66c672161125 igdusc32.dll | igd10iumd32.dll | d3d11.dll@0x63ddb There aren't many crashes so I am not sure if the root cuase is captured, but bp-2ca9629f-c698-49b5-8646-3bdd92161206 looks suspicious as it has nsTArray_base::ShiftData in the crash stack. I'll take a look at it. I am going to backout the debug patch.  https://crash-stats.mozilla.com/search/?build_id=%3E%3D20161012030211&reason=~EXCEPTION_ACCESS_VIOLATION_WRITE&address=%24fc&product=Firefox&version=53.0a1&version=52.0a2&version=52.0a1&platform=Windows&date=%3E%3D2016-11-12T17%3A17%3A14.000Z&date=%3C2016-12-12T17%3A17%3A14.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=address&_columns=reason#crash-reports
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/62b7071249cf Back out the debug patch (changeset 2eebd44ff2e9). r=me
Approval Request Comment [Feature/Bug causing the regression]: none [User impact if declined]: perf regression [Is this code covered by automated tests?]: not sure [Has the fix been verified in Nightly?]: no [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: no [Why is the change risky/not risky?]: just back out a debug patch [String changes made/needed]: none
(In reply to Ting-Yu Chou [:ting] from comment #31) > There aren't many crashes so I am not sure if the root cuase is captured, > but bp-2ca9629f-c698-49b5-8646-3bdd92161206 looks suspicious as it has > nsTArray_base::ShiftData in the crash stack. I'll take a look at it. It doesn't seem relate to the bug here because: 1) From vs2015, the crash address 0x11ba52fc is __vfptr of the argument aCallback for ImplCycleCollectionTraverse() which is not nsHttpHeaderArray. 2) The stack in crash-stats is different from the one when open the crash dump by vs2015, which does not have nsTArray_base::ShiftData(): xul.dll!ImplCycleCollectionTraverse<nsIBrowserElementAPI>(nsCycleCollectionTraversalCallback & aCallback, nsCOMPtr<nsIBrowserElementAPI> & aField, const char *) Line 1102 C++ xul.dll!nsPurpleBuffer::PurpleBlock::VisitEntries<RemoveSkippableVisitor>(nsPurpleBuffer & aBuffer, RemoveSkippableVisitor & aVisitor) Line 1035 C++ xul.dll!SnowWhiteKiller::SnowWhiteKiller(nsCycleCollector * aCollector) Line 2649 C++ xul.dll!nsTimerImpl::Fire() Line 477 C++ xul.dll!nsTimerEvent::Run() Line 293 C++ xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 1222 C++ xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 96 C++ xul.dll!MessageLoop::RunHandler() Line 226 C++ xul.dll!MessageLoop::Run() Line 206 C++ xul.dll!nsBaseAppShell::Run() Line 158 C++ xul.dll!nsAppShell::Run() Line 262 C++ xul.dll!nsAppStartup::Run() Line 283 C++ xul.dll!XREMain::XRE_mainRun() Line 4467 C++ xul.dll!XREMain::XRE_main(int argc, char * * argv, const nsXREAppData * aAppData) Line 4600 C++ xul.dll!XRE_main(int argc, char * * argv, const nsXREAppData * aAppData, unsigned int aFlags) Line 4692 C++ firefox.exe!wmain(int argc, wchar_t * * argv) Line 115 C++ [External Code] 3) The assembly around ip (0x10ed62a2) does not write to any address: 10ED62A0 mov edi,dword ptr [edx] 10ED62A2 test byte ptr [esi+4],1 10ED62A6 je ImplCycleCollectionTraverse<nsIBrowserElementAPI>+19h (10ED62B5h)
VirtualProtect() doesn't catch noticeable invalid writes, could it be something in kernel mode? I don't know how to proceed without STR...
Comment on attachment 8818143 [details] [diff] [review] v4 backout backout a debugging patch from aurora52
Attachment #8818143 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Bulk priority update: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Ting, where are we with this bug? Should it stay open?
Priority: P1 → P3
The patches I tried here didn't catch any culprits, and I don't have any other ideas how to proceed without STR.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
Removing leave-open keyword from resolved bugs, per :sylvestre.
You need to log in before you can comment on or make changes to this bug.