Closed
Bug 1323207
Opened 8 years ago
Closed 8 years ago
Add better crash signatures for known places that call JS during painting
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
WONTFIX
mozilla53
People
(Reporter: mccr8, Assigned: mccr8)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
There are various assertions that check that we don't enter the JS engine during painting. This is useful to find new places that do this, but we have a number of existing unfixed issues (bug 1310335 and various bugs blocking bug 1279086) that hit this, and unfortunately the crash signatures these generate are very generic, and can shift over time. This is bad because it makes it hard for anybody looking at crash lists how frequent these crashes are, or if a crash signature is new or not.
One way to improve these signatures would be to set a flag while we're in InterruptCallback, and do a release assert at known points that the flag is not set (like at the start of nsContentUtils::IsPatternMatching). Anybody fixing one of these issues would have to remove the assertion.
Assignee | ||
Comment 1•8 years ago
|
||
This should get backported to branches with bug 1308039 (52) and maybe bug 1279086 (51).
Is the crash reason field not sufficient?
Assignee | ||
Comment 3•8 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #2)
> Is the crash reason field not sufficient?
That does not address the problem of distinguishing the different reasons that we are entering the JS engine during painting. (Arguably, the crash reason field is not really sufficient to distinguish painting crashes from other crashes, given that crash triagers mostly look at signatures, but that has not yet been a problem in practice.)
For example, how do we tell apart instances of bug 1310335 from instances of bug 1311843 without looking at individual crash reports, and without having to come up with some custom super search that looks at proto signatures? (And more importantly, how do we tell if somebody adds a new place that starts hitting this assert.)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 6•8 years ago
|
||
mozreview-review |
Comment on attachment 8819024 [details]
Bug 1323207, part 2 - Assert early if we're painting at various points we enter JS.
https://reviewboard.mozilla.org/r/98902/#review99162
::: xpcom/ds/nsObserverList.cpp:109
(Diff revision 1)
> + MOZ_RELEASE_ASSERT(js::AllowGCBarriers(CycleCollectedJSContext::Get()->Context()),
> + "Observers can be implement in JS, so they should not be called during painting.");
I'm worried you're going to find a lot more stuff with this, but let's see what happens :-).
Attachment #8819024 -
Flags: review?(wmccloskey) → review+
Comment 7•8 years ago
|
||
mozreview-review |
Comment on attachment 8819023 [details]
Bug 1323207, part 1 - Add method to check AllocGCBarriers.
https://reviewboard.mozilla.org/r/98900/#review99372
Attachment #8819023 -
Flags: review?(jcoppeard) → review+
Assignee | ||
Comment 8•8 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #6)
> I'm worried you're going to find a lot more stuff with this, but let's see
> what happens :-).
Yeah, it may have to get backed out, but I figured I'd try it.
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a04031da95dc
part 1 - Add method to check AllocGCBarriers. r=jonco
https://hg.mozilla.org/integration/autoland/rev/491237dbcb5d
part 2 - Assert early if we're painting at various points we enter JS. r=billm
Assignee | ||
Comment 10•8 years ago
|
||
Assignee | ||
Comment 11•8 years ago
|
||
This breaks, but only on Windows 8 (!?):
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=491237dbcb5d29543e2f9bacf37f7eefbbf650b2
It is hitting the observer service assert, but during shutdown, when it does the NS_XPCOM_SHUTDOWN_OBSERVER_ID notification. I have no idea why it is asserting there.
Assignee | ||
Comment 12•8 years ago
|
||
Oh I see, I think maybe this is happening in the GPU process...
![]() |
||
Comment 13•8 years ago
|
||
Backed out for mass-crashing on Windows 8 x64 debug e10s:
https://hg.mozilla.org/integration/autoland/rev/6cea439a7674fdc4e0f0c6c62f9a035c4f10e633
https://hg.mozilla.org/integration/autoland/rev/e8014faca57975f249d5b4b62331c51c315d7a90
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=491237dbcb5d29543e2f9bacf37f7eefbbf650b2
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=8027895&repo=autoland
10:30:09 INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ nsObserverList::NotifyObservers(nsISupports *,char const *,char16_t const *)]
10:30:09 INFO - Crash dump filename: c:\users\cltbld~1.t-w\appdata\local\temp\tmp8cevqd.mozrunner\minidumps\4cb1637c-5ce6-433c-8757-c42b148dff6b.dmp
10:30:09 INFO - Operating system: Windows NT
10:30:09 INFO - 6.2.9200
10:30:09 INFO - CPU: amd64
10:30:09 INFO - family 6 model 30 stepping 5
10:30:09 INFO - 8 CPUs
10:30:09 INFO -
10:30:09 INFO - GPU: UNKNOWN
10:30:09 INFO -
10:30:09 INFO - Crash reason: EXCEPTION_ACCESS_VIOLATION_READ
10:30:09 INFO - Crash address: 0x120
10:30:09 INFO - Assertion: Unknown assertion type 0x00000000
10:30:09 INFO - Process uptime: 17 seconds
10:30:09 INFO -
10:30:09 INFO - Thread 0 (crashed)
10:30:09 INFO - 0 xul.dll!nsObserverList::NotifyObservers(nsISupports *,char const *,char16_t const *) [nsObserverList.cpp:491237dbcb5d : 110 + 0x7]
10:30:09 INFO - rax = 0x0000000000000000 rdx = 0x0000005989dab188
10:30:09 INFO - rcx = 0x000000000000002a rbx = 0x0000000000000000
10:30:09 INFO - rsi = 0x0000000000000000 rdi = 0x0000005989dab4c8
10:30:09 INFO - rbp = 0x000007fc3814b110 rsp = 0x000000598995f2b0
10:30:09 INFO - r8 = 0x000007fc3814b110 r9 = 0x0000000000000000
10:30:09 INFO - r10 = 0x000007fc38163cb0 r11 = 0x000000598995f240
10:30:09 INFO - r12 = 0x000000598995f6b0 r13 = 0x0000000000000000
10:30:09 INFO - r14 = 0x0000005989dab188 r15 = 0x0000000000000003
10:30:09 INFO - rip = 0x000007fc337113bc
10:30:09 INFO - Found by: given as instruction pointer in context
10:30:09 INFO - 1 xul.dll!nsObserverService::NotifyObservers(nsISupports *,char const *,char16_t const *) [nsObserverService.cpp:491237dbcb5d : 281 + 0x11]
10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110
10:30:09 INFO - rsp = 0x000000598995f2f0 r12 = 0x000000598995f6b0
10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188
10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007fc33711637
10:30:09 INFO - Found by: call frame info
10:30:09 INFO - 2 xul.dll!mozilla::ShutdownXPCOM(nsIServiceManager *) [XPCOMInit.cpp:491237dbcb5d : 909 + 0x63]
10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110
10:30:09 INFO - rsp = 0x000000598995f330 r12 = 0x000000598995f6b0
10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188
10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007fc3379e382
10:30:09 INFO - Found by: call frame info
10:30:09 INFO - 3 xul.dll!XRE_InitChildProcess [nsEmbedFunctions.cpp:491237dbcb5d : 760 + 0x9]
10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110
10:30:09 INFO - rsp = 0x000000598995f3c0 r12 = 0x000000598995f6b0
10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188
10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007fc36c42589
10:30:09 INFO - Found by: call frame info
10:30:09 INFO - 4 firefox.exe!content_process_main(int,char * * const) [plugin-container.cpp:491237dbcb5d : 115 + 0x10]
10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110
10:30:09 INFO - rsp = 0x000000598995f680 r12 = 0x000000598995f6b0
10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188
10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007f63df61d45
10:30:09 INFO - Found by: call frame info
10:30:09 INFO - 5 firefox.exe!NS_internal_main(int,char * *,char * *) [nsBrowserApp.cpp:491237dbcb5d : 430 + 0xa]
10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110
10:30:09 INFO - rsp = 0x000000598995f6e0 r12 = 0x000000598995f6b0
10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188
10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007f63df618f9
10:30:09 INFO - Found by: call frame info
10:30:09 INFO - 6 firefox.exe!wmain [nsWindowsWMain.cpp:491237dbcb5d : 115 + 0x14]
10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110
10:30:09 INFO - rsp = 0x000000598995f760 r12 = 0x000000598995f6b0
10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188
10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007f63df6266f
10:30:09 INFO - Found by: call frame info
10:30:09 INFO - 7 firefox.exe!__scrt_common_main_seh [exe_common.inl : 253 + 0x22]
10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110
10:30:09 INFO - rsp = 0x000000598995f7c0 r12 = 0x000000598995f6b0
10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188
10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007f63dfa115d
10:30:09 INFO - Found by: call frame info
10:30:09 INFO - 8 kernel32.dll!BaseThreadInitThunk + 0x1a
10:30:09 INFO - rbx = 0x0000000000000000 rbp = 0x000007fc3814b110
10:30:09 INFO - rsp = 0x000000598995f800 r12 = 0x000000598995f6b0
10:30:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000005989dab188
10:30:09 INFO - r15 = 0x0000000000000003 rip = 0x000007fc533f167e
10:30:09 INFO - Found by: call frame info
10:30:09 INFO - 9 ntdll.dll!RtlUserThreadStart + 0x21
10:30:09 INFO - rbp = 0x000007fc3814b110 rsp = 0x000000598995f830
10:30:09 INFO - rip = 0x000007fc557ac3f1
10:30:09 INFO - Found by: stack scanning
10:30:09 INFO - 10 KERNELBASE.dll!GetLegacyComposition + 0x1180
10:30:09 INFO - rbp = 0x000007fc3814b110 rsp = 0x000000598995f860
10:30:09 INFO - rip = 0x000007fc52aa09d0
10:30:09 INFO - Found by: stack scanning
Flags: needinfo?(continuation)
Assignee | ||
Comment 14•8 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #12)
> Oh I see, I think maybe this is happening in the GPU process...
and it looks like it isn't hitting the assert per se. I'm guessing CycleCollectedJSContext::Get()->Context() is doing a null deref or something.
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(continuation)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 16•8 years ago
|
||
I added a null check on Context(). Hopefully that will be enough.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=253a2861ac5b64c89f929b125c72928485c09e3d
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 19•8 years ago
|
||
For now, I'll land it without the observer service changes.
Comment 20•8 years ago
|
||
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/40440a0e3073
part 1 - Add method to check AllocGCBarriers. r=jonco
https://hg.mozilla.org/integration/autoland/rev/5f06a88bdc86
part 2 - Assert early if we're painting at various points we enter JS. r=billm
Are you sure CycleCollectedJSContext::Get() isn't returning null?
Assignee | ||
Comment 22•8 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #21)
> Are you sure CycleCollectedJSContext::Get() isn't returning null?
The stub XPCOM the GPU process uses does start the CC, or so I thought, but that does seem like the most likely remaining explanation.
Comment 23•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/40440a0e3073
https://hg.mozilla.org/mozilla-central/rev/5f06a88bdc86
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
(In reply to Andrew McCreight [:mccr8] from comment #22)
> (In reply to Bill McCloskey (:billm) from comment #21)
> > Are you sure CycleCollectedJSContext::Get() isn't returning null?
>
> The stub XPCOM the GPU process uses does start the CC, or so I thought, but
> that does seem like the most likely remaining explanation.
It doesn't get nulled out at shutdown?
Assignee | ||
Comment 25•8 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #24)
> It doesn't get nulled out at shutdown?
The observer service is called much earlier than that. Plus things are fine except for the GPU process.
Comment 26•8 years ago
|
||
Backout by wmccloskey@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/335c5d74c1a1
Back out, part 2 - Assert early if we're painting at various points we enter JS (a=backout)
https://hg.mozilla.org/integration/mozilla-inbound/rev/51d359b72c57
Back out, part 1 - Add method to check AllocGCBarriers (a=backout)
Comment 27•8 years ago
|
||
Seems that this was backed out. This is the last week before the merge of FF53. Do we need to get this in?
Flags: needinfo?(continuation)
Assignee | ||
Comment 28•8 years ago
|
||
This was backed out because the patch that was causing this issue was also backed out, so there's no need for it any more.
Flags: needinfo?(continuation)
Resolution: FIXED → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•