Closed Bug 1323207 Opened 3 years ago Closed 3 years ago

Add better crash signatures for known places that call JS during painting

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED WONTFIX
mozilla53
Tracking Status
firefox51 --- wontfix
firefox52 --- wontfix
firefox53 --- wontfix

People

(Reporter: mccr8, Assigned: mccr8)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

There are various assertions that check that we don't enter the JS engine during painting. This is useful to find new places that do this, but we have a number of existing unfixed issues (bug 1310335 and various bugs blocking bug 1279086) that hit this, and unfortunately the crash signatures these generate are very generic, and can shift over time. This is bad because it makes it hard for anybody looking at crash lists how frequent these crashes are, or if a crash signature is new or not.

One way to improve these signatures would be to set a flag while we're in InterruptCallback, and do a release assert at known points that the flag is not set (like at the start of nsContentUtils::IsPatternMatching). Anybody fixing one of these issues would have to remove the assertion.
This should get backported to branches with bug 1308039 (52) and maybe bug 1279086 (51).
Is the crash reason field not sufficient?
(In reply to Bill McCloskey (:billm) from comment #2)
> Is the crash reason field not sufficient?

That does not address the problem of distinguishing the different reasons that we are entering the JS engine during painting. (Arguably, the crash reason field is not really sufficient to distinguish painting crashes from other crashes, given that crash triagers mostly look at signatures, but that has not yet been a problem in practice.)

For example, how do we tell apart instances of bug 1310335 from instances of bug 1311843 without looking at individual crash reports, and without having to come up with some custom super search that looks at proto signatures? (And more importantly, how do we tell if somebody adds a new place that starts hitting this assert.)
Comment on attachment 8819024 [details]
Bug 1323207, part 2 - Assert early if we're painting at various points we enter JS.

https://reviewboard.mozilla.org/r/98902/#review99162

::: xpcom/ds/nsObserverList.cpp:109
(Diff revision 1)
> +  MOZ_RELEASE_ASSERT(js::AllowGCBarriers(CycleCollectedJSContext::Get()->Context()),
> +                     "Observers can be implement in JS, so they should not be called during painting.");

I'm worried you're going to find a lot more stuff with this, but let's see what happens :-).
Attachment #8819024 - Flags: review?(wmccloskey) → review+
Comment on attachment 8819023 [details]
Bug 1323207, part 1 - Add method to check AllocGCBarriers.

https://reviewboard.mozilla.org/r/98900/#review99372
Attachment #8819023 - Flags: review?(jcoppeard) → review+
(In reply to Bill McCloskey (:billm) from comment #6)
> I'm worried you're going to find a lot more stuff with this, but let's see
> what happens :-).

Yeah, it may have to get backed out, but I figured I'd try it.
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a04031da95dc
part 1 - Add method to check AllocGCBarriers. r=jonco
https://hg.mozilla.org/integration/autoland/rev/491237dbcb5d
part 2 - Assert early if we're painting at various points we enter JS. r=billm
This breaks, but only on Windows 8 (!?):
  https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=491237dbcb5d29543e2f9bacf37f7eefbbf650b2

It is hitting the observer service assert, but during shutdown, when it does the NS_XPCOM_SHUTDOWN_OBSERVER_ID notification. I have no idea why it is asserting there.
Oh I see, I think maybe this is happening in the GPU process...
Backed out for mass-crashing on Windows 8 x64 debug e10s:

https://hg.mozilla.org/integration/autoland/rev/6cea439a7674fdc4e0f0c6c62f9a035c4f10e633
https://hg.mozilla.org/integration/autoland/rev/e8014faca57975f249d5b4b62331c51c315d7a90

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=491237dbcb5d29543e2f9bacf37f7eefbbf650b2
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=8027895&repo=autoland

10:30:09     INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ nsObserverList::NotifyObservers(nsISupports *,char const *,char16_t const *)]
10:30:09     INFO - Crash dump filename: c:\users\cltbld~1.t-w\appdata\local\temp\tmp8cevqd.mozrunner\minidumps\4cb1637c-5ce6-433c-8757-c42b148dff6b.dmp
10:30:09     INFO - Operating system: Windows NT
10:30:09     INFO -                   6.2.9200 
10:30:09     INFO - CPU: amd64
10:30:09     INFO -      family 6 model 30 stepping 5
10:30:09     INFO -      8 CPUs
10:30:09     INFO - 
10:30:09     INFO - GPU: UNKNOWN
10:30:09     INFO - 
10:30:09     INFO - Crash reason:  EXCEPTION_ACCESS_VIOLATION_READ
10:30:09     INFO - Crash address: 0x120
10:30:09     INFO - Assertion: Unknown assertion type 0x00000000
10:30:09     INFO - Process uptime: 17 seconds
10:30:09     INFO - 
10:30:09     INFO - Thread 0 (crashed)
10:30:09     INFO -  0  xul.dll!nsObserverList::NotifyObservers(nsISupports *,char const *,char16_t const *) [nsObserverList.cpp:491237dbcb5d : 110 + 0x7]
10:30:09     INFO -     rax = 0x0000000000000000   rdx = 0x0000005989dab188
10:30:09     INFO -     rcx = 0x000000000000002a   rbx = 0x0000000000000000
10:30:09     INFO -     rsi = 0x0000000000000000   rdi = 0x0000005989dab4c8
10:30:09     INFO -     rbp = 0x000007fc3814b110   rsp = 0x000000598995f2b0
10:30:09     INFO -      r8 = 0x000007fc3814b110    r9 = 0x0000000000000000
10:30:09     INFO -     r10 = 0x000007fc38163cb0   r11 = 0x000000598995f240
10:30:09     INFO -     r12 = 0x000000598995f6b0   r13 = 0x0000000000000000
10:30:09     INFO -     r14 = 0x0000005989dab188   r15 = 0x0000000000000003
10:30:09     INFO -     rip = 0x000007fc337113bc
10:30:09     INFO -     Found by: given as instruction pointer in context
10:30:09     INFO -  1  xul.dll!nsObserverService::NotifyObservers(nsISupports *,char const *,char16_t const *) [nsObserverService.cpp:491237dbcb5d : 281 + 0x11]
10:30:09     INFO -     rbx = 0x0000000000000000   rbp = 0x000007fc3814b110
10:30:09     INFO -     rsp = 0x000000598995f2f0   r12 = 0x000000598995f6b0
10:30:09     INFO -     r13 = 0x0000000000000000   r14 = 0x0000005989dab188
10:30:09     INFO -     r15 = 0x0000000000000003   rip = 0x000007fc33711637
10:30:09     INFO -     Found by: call frame info
10:30:09     INFO -  2  xul.dll!mozilla::ShutdownXPCOM(nsIServiceManager *) [XPCOMInit.cpp:491237dbcb5d : 909 + 0x63]
10:30:09     INFO -     rbx = 0x0000000000000000   rbp = 0x000007fc3814b110
10:30:09     INFO -     rsp = 0x000000598995f330   r12 = 0x000000598995f6b0
10:30:09     INFO -     r13 = 0x0000000000000000   r14 = 0x0000005989dab188
10:30:09     INFO -     r15 = 0x0000000000000003   rip = 0x000007fc3379e382
10:30:09     INFO -     Found by: call frame info
10:30:09     INFO -  3  xul.dll!XRE_InitChildProcess [nsEmbedFunctions.cpp:491237dbcb5d : 760 + 0x9]
10:30:09     INFO -     rbx = 0x0000000000000000   rbp = 0x000007fc3814b110
10:30:09     INFO -     rsp = 0x000000598995f3c0   r12 = 0x000000598995f6b0
10:30:09     INFO -     r13 = 0x0000000000000000   r14 = 0x0000005989dab188
10:30:09     INFO -     r15 = 0x0000000000000003   rip = 0x000007fc36c42589
10:30:09     INFO -     Found by: call frame info
10:30:09     INFO -  4  firefox.exe!content_process_main(int,char * * const) [plugin-container.cpp:491237dbcb5d : 115 + 0x10]
10:30:09     INFO -     rbx = 0x0000000000000000   rbp = 0x000007fc3814b110
10:30:09     INFO -     rsp = 0x000000598995f680   r12 = 0x000000598995f6b0
10:30:09     INFO -     r13 = 0x0000000000000000   r14 = 0x0000005989dab188
10:30:09     INFO -     r15 = 0x0000000000000003   rip = 0x000007f63df61d45
10:30:09     INFO -     Found by: call frame info
10:30:09     INFO -  5  firefox.exe!NS_internal_main(int,char * *,char * *) [nsBrowserApp.cpp:491237dbcb5d : 430 + 0xa]
10:30:09     INFO -     rbx = 0x0000000000000000   rbp = 0x000007fc3814b110
10:30:09     INFO -     rsp = 0x000000598995f6e0   r12 = 0x000000598995f6b0
10:30:09     INFO -     r13 = 0x0000000000000000   r14 = 0x0000005989dab188
10:30:09     INFO -     r15 = 0x0000000000000003   rip = 0x000007f63df618f9
10:30:09     INFO -     Found by: call frame info
10:30:09     INFO -  6  firefox.exe!wmain [nsWindowsWMain.cpp:491237dbcb5d : 115 + 0x14]
10:30:09     INFO -     rbx = 0x0000000000000000   rbp = 0x000007fc3814b110
10:30:09     INFO -     rsp = 0x000000598995f760   r12 = 0x000000598995f6b0
10:30:09     INFO -     r13 = 0x0000000000000000   r14 = 0x0000005989dab188
10:30:09     INFO -     r15 = 0x0000000000000003   rip = 0x000007f63df6266f
10:30:09     INFO -     Found by: call frame info
10:30:09     INFO -  7  firefox.exe!__scrt_common_main_seh [exe_common.inl : 253 + 0x22]
10:30:09     INFO -     rbx = 0x0000000000000000   rbp = 0x000007fc3814b110
10:30:09     INFO -     rsp = 0x000000598995f7c0   r12 = 0x000000598995f6b0
10:30:09     INFO -     r13 = 0x0000000000000000   r14 = 0x0000005989dab188
10:30:09     INFO -     r15 = 0x0000000000000003   rip = 0x000007f63dfa115d
10:30:09     INFO -     Found by: call frame info
10:30:09     INFO -  8  kernel32.dll!BaseThreadInitThunk + 0x1a
10:30:09     INFO -     rbx = 0x0000000000000000   rbp = 0x000007fc3814b110
10:30:09     INFO -     rsp = 0x000000598995f800   r12 = 0x000000598995f6b0
10:30:09     INFO -     r13 = 0x0000000000000000   r14 = 0x0000005989dab188
10:30:09     INFO -     r15 = 0x0000000000000003   rip = 0x000007fc533f167e
10:30:09     INFO -     Found by: call frame info
10:30:09     INFO -  9  ntdll.dll!RtlUserThreadStart + 0x21
10:30:09     INFO -     rbp = 0x000007fc3814b110   rsp = 0x000000598995f830
10:30:09     INFO -     rip = 0x000007fc557ac3f1
10:30:09     INFO -     Found by: stack scanning
10:30:09     INFO - 10  KERNELBASE.dll!GetLegacyComposition + 0x1180
10:30:09     INFO -     rbp = 0x000007fc3814b110   rsp = 0x000000598995f860
10:30:09     INFO -     rip = 0x000007fc52aa09d0
10:30:09     INFO -     Found by: stack scanning
Flags: needinfo?(continuation)
(In reply to Andrew McCreight [:mccr8] from comment #12)
> Oh I see, I think maybe this is happening in the GPU process...

and it looks like it isn't hitting the assert per se. I'm guessing CycleCollectedJSContext::Get()->Context() is doing a null deref or something.
Flags: needinfo?(continuation)
I added a null check on Context(). Hopefully that will be enough.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=253a2861ac5b64c89f929b125c72928485c09e3d
Blocks: 1324237
For now, I'll land it without the observer service changes.
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/40440a0e3073
part 1 - Add method to check AllocGCBarriers. r=jonco
https://hg.mozilla.org/integration/autoland/rev/5f06a88bdc86
part 2 - Assert early if we're painting at various points we enter JS. r=billm
Are you sure CycleCollectedJSContext::Get() isn't returning null?
(In reply to Bill McCloskey (:billm) from comment #21)
> Are you sure CycleCollectedJSContext::Get() isn't returning null?

The stub XPCOM the GPU process uses does start the CC, or so I thought, but that does seem like the most likely remaining explanation.
https://hg.mozilla.org/mozilla-central/rev/40440a0e3073
https://hg.mozilla.org/mozilla-central/rev/5f06a88bdc86
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
(In reply to Andrew McCreight [:mccr8] from comment #22)
> (In reply to Bill McCloskey (:billm) from comment #21)
> > Are you sure CycleCollectedJSContext::Get() isn't returning null?
> 
> The stub XPCOM the GPU process uses does start the CC, or so I thought, but
> that does seem like the most likely remaining explanation.

It doesn't get nulled out at shutdown?
(In reply to Bill McCloskey (:billm) from comment #24)
> It doesn't get nulled out at shutdown?

The observer service is called much earlier than that. Plus things are fine except for the GPU process.
Blocks: 1324308
Depends on: 1324642
Backout by wmccloskey@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/335c5d74c1a1
Back out, part 2 - Assert early if we're painting at various points we enter JS (a=backout)
https://hg.mozilla.org/integration/mozilla-inbound/rev/51d359b72c57
Back out, part 1 - Add method to check AllocGCBarriers (a=backout)
Seems that this was backed out. This is the last week before the merge of FF53. Do we need to get this in?
Flags: needinfo?(continuation)
This was backed out because the patch that was causing this issue was also backed out, so there's no need for it any more.
Flags: needinfo?(continuation)
Resolution: FIXED → WONTFIX
You need to log in before you can comment on or make changes to this bug.