Intermittent browser_dbg_variables-view-popup-05.js,browser_dbg_variables-view-popup-06.js | application crashed [@ nsPresContext::IsDOMPaintEventPending()]

RESOLVED FIXED in Firefox 29, Firefox OS v1.4

Status

()

Core
Layout
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: RyanVM, Assigned: mats)

Tracking

({crash, intermittent-failure})

Trunk
mozilla30
crash, intermittent-failure
Points:
---

Firefox Tracking Flags

(firefox28 disabled, firefox29 fixed, firefox30 fixed, firefox-esr24 unaffected, b2g-v1.3 disabled, b2g-v1.4 fixed)

Details

Attachments

(5 attachments, 1 obsolete attachment)

(Reporter)

Description

4 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=31504024&tree=Mozilla-Inbound

Rev5 MacOSX Mountain Lion 10.8 mozilla-inbound opt test mochitest-browser-chrome on 2013-12-05 08:36:53 PST for push bfb81cac0108
slave: talos-mtnlion-r5-036

08:56:09     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | Enough 'cursorActivity' editor events have been fired.
08:56:09     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | Caret updated: 24, 1
08:56:09     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | Current editor caret position: 24, 1
08:56:09     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | The correct caret position has been set.
08:56:09     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | Debugger event 'Debugger:FetchedScopes' fired: 1 time(s).
08:56:09     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | Enough 'Debugger:FetchedScopes' panel events have been fired.
08:56:09     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | Waiting for event: 'popupshown' on [object XULElement].
08:56:09     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | Waiting for debugger event: 'Debugger:FetchedBubbleProperties' to fire: 1 time(s).
08:56:09     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | Debugger event 'Debugger:FetchedBubbleProperties' fired: 1 time(s).
08:56:09     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | Enough 'Debugger:FetchedBubbleProperties' panel events have been fired.
08:56:09     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | There should be one variables view container added to the tooltip.
08:56:09     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | There should be one scope with no header displayed.
08:56:09     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | There should be one variable with no header displayed.
08:56:09     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | There should be 2 properties displayed.
08:56:09     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | The first property's name is correct.
08:56:09     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | The first property's value is correct.
08:56:09     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | The second property's name is correct.
08:56:09     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | The second property's value is correct.
08:56:09     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | Destroying the specified debugger.
08:56:09     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | Waiting for event: 'Debugger:Shutdown' on [object ChromeWindow].
08:56:09     INFO -  TEST-INFO | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | Waiting for event: 'destroyed' on [object Object].
08:56:10  WARNING -  TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | application terminated with exit code 256
08:56:10     INFO -  INFO | runtests.py | Application ran for: 0:15:23.614338
08:56:10     INFO -  INFO | zombiecheck | Reading PID log: /var/folders/hm/k3yvp6510jl9n7krlfq8rsnh00000w/T/tmpHNvcXGpidlog
08:56:10     INFO -  mozcrash INFO | Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-macosx64/1386258733/firefox-28.0a1.en-US.mac.crashreporter-symbols.zip
08:57:01  WARNING -  PROCESS-CRASH | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js | application crashed [@ nsPresContext::IsDOMPaintEventPending()]
08:57:01     INFO -  Crash dump filename: /var/folders/hm/k3yvp6510jl9n7krlfq8rsnh00000w/T/tmpZeMRIu/minidumps/7B7F9319-96DA-4408-8724-9302A82C1964.dmp
08:57:01     INFO -  Operating system: Mac OS X
08:57:01     INFO -                    10.8.0 12A269
08:57:01     INFO -  CPU: amd64
08:57:01     INFO -       family 6 model 42 stepping 7
08:57:01     INFO -       8 CPUs
08:57:01     INFO -  Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
08:57:01     INFO -  Crash address: 0x18
08:57:01     INFO -  Thread 0 (crashed)
08:57:01     INFO -   0  XUL!nsPresContext::IsDOMPaintEventPending() [nsPresContext.h:bfb81cac0108 : 180 + 0x0]
08:57:01     INFO -      rbx = 0x0000000000000000   r12 = 0x0000000000000000
08:57:01     INFO -      r13 = 0x0000000000000000   r14 = 0x000000013dcb4800
08:57:01     INFO -      r15 = 0x0000000110108000   rip = 0x0000000102971803
08:57:01     INFO -      rsp = 0x00007fff5fbefde0   rbp = 0x00007fff5fbefe00
08:57:01     INFO -      Found by: given as instruction pointer in context
08:57:01     INFO -   1  XUL!NotifyDidPaintSubdocumentCallback [nsPresContext.cpp:bfb81cac0108 : 2368 + 0x7]
08:57:01     INFO -      rbx = 0x000000013dcb4800   r12 = 0x0000000000000000
08:57:01     INFO -      r13 = 0x0000000000000000   r14 = 0x00007fff5fbefec0
08:57:01     INFO -      r15 = 0x0000000110108000   rip = 0x0000000102976aec
08:57:01     INFO -      rsp = 0x00007fff5fbefe10   rbp = 0x00007fff5fbefe20
08:57:01     INFO -      Found by: call frame info
08:57:01     INFO -   2  XUL!SubDocHashEnum [nsDocument.cpp:bfb81cac0108 : 7677 + 0x5]
08:57:01     INFO -      rbx = 0x000000013297a240   r12 = 0x0000000000000000
08:57:01     INFO -      r13 = 0x0000000000000000   r14 = 0x000000013779a760
08:57:01     INFO -      r15 = 0x0000000110108000   rip = 0x00000001023be845
08:57:01     INFO -      rsp = 0x00007fff5fbefe30   rbp = 0x00007fff5fbefe30
08:57:01     INFO -      Found by: call frame info
08:57:01     INFO -   3  XUL!PL_DHashTableEnumerate [pldhash.cpp:bfb81cac0108 : 632 + 0xf]
08:57:01     INFO -      rbx = 0x000000013297a240   r12 = 0x0000000000000000
08:57:01     INFO -      r13 = 0x0000000000000000   r14 = 0x000000013779a760
08:57:01     INFO -      r15 = 0x0000000110108000   rip = 0x00000001014ab8b5
08:57:01     INFO -      rsp = 0x00007fff5fbefe40   rbp = 0x00007fff5fbefe90
08:57:01     INFO -      Found by: call frame info
08:57:01     INFO -   4  XUL!nsDocument::EnumerateSubDocuments(bool (*)(nsIDocument*, void*), void*) [nsDocument.cpp:bfb81cac0108 : 7687 + 0xf]
08:57:01     INFO -      rbx = 0x000000013d685be0   r12 = 0x0000000000000002
08:57:01     INFO -      r13 = 0x0000000135c09ab0   r14 = 0x0000000000000002
08:57:01     INFO -      r15 = 0x0000000110108000   rip = 0x00000001023be81c
08:57:01     INFO -      rsp = 0x00007fff5fbefea0   rbp = 0x00007fff5fbefeb0
08:57:01     INFO -      Found by: call frame info
(Reporter)

Comment 1

4 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=31506641&tree=Mozilla-Inbound
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Matt, are we missing some null checks?
Component: DOM → Layout
Somewhere http://hg.mozilla.org/mozilla-central/annotate/1ad9af3a2ab8/layout/base/nsPresContext.cpp#l132

(In general Get* methods may return null)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Reporter)

Comment 59

4 years ago
(In reply to Olli Pettay [:smaug] from comment #49)
> Matt, are we missing some null checks?
Flags: needinfo?(matspal)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 63

4 years ago
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #59)
> (In reply to Olli Pettay [:smaug] from comment #49)
> > Matt, are we missing some null checks?

I'm not Matt, but I'll comment anyway. ;-)

I don't see any MozAfterPaint listeners in
browser_dbg_variables-view-popup-06.js nor in the preceeding -05.js,
http://mxr.mozilla.org/mozilla-central/source/browser/devtools/debugger/test/browser_dbg_variables-view-popup-06.js
http://mxr.mozilla.org/mozilla-central/source/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js
http://mxr.mozilla.org/mozilla-central/source/browser/devtools/debugger/test/doc_frame-parameters.html?force=1
so I'm puzzled as to how we got into this code (on the stack) in the first
place.  We shouldn't be here if there are no paint listeners, AIUI.
So maybe there's some earlier test that lingers with a listener, but why is
that window painting then?

That said, we should probably add some protective measures to avoid
detached pres contexts in this code.  A null-check at the crashing
line feels a bit late, it should probably be somewhat earlier.

I would really like to understand the nature of this problem before
wallpapering over it though, so I would suggest we first add code
to detect the condition and print out some data.  At a minimum, we
could dump the URL from the mDocument, and the same from any ancestor
pres context that we can reach.  And perhaps some pres shell state
for each as well.
Flags: needinfo?(matspal)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(In reply to Mats Palmgren (:mats) from comment #63)
> (In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #59)
> > (In reply to Olli Pettay [:smaug] from comment #49)
> > > Matt, are we missing some null checks?
> 
> I'm not Matt, but I'll comment anyway. ;-)
> 
> I don't see any MozAfterPaint listeners in
> browser_dbg_variables-view-popup-06.js nor in the preceeding -05.js,
> http://mxr.mozilla.org/mozilla-central/source/browser/devtools/debugger/test/
> browser_dbg_variables-view-popup-06.js
> http://mxr.mozilla.org/mozilla-central/source/browser/devtools/debugger/test/
> browser_dbg_variables-view-popup-05.js
> http://mxr.mozilla.org/mozilla-central/source/browser/devtools/debugger/test/
> doc_frame-parameters.html?force=1
> so I'm puzzled as to how we got into this code (on the stack) in the first
> place.  We shouldn't be here if there are no paint listeners, AIUI.
> So maybe there's some earlier test that lingers with a listener, but why is
> that window painting then?
> 
> That said, we should probably add some protective measures to avoid
> detached pres contexts in this code.  A null-check at the crashing
> line feels a bit late, it should probably be somewhat earlier.
> 
> I would really like to understand the nature of this problem before
> wallpapering over it though, so I would suggest we first add code
> to detect the condition and print out some data.  At a minimum, we
> could dump the URL from the mDocument, and the same from any ancestor
> pres context that we can reach.  And perhaps some pres shell state
> for each as well.

Do you want to write this patch Mats? I'm not really sure what state would be useful to dump.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 75

4 years ago
(In reply to Matt Woodrow (:mattwoodrow) from comment #68)
> Do you want to write this patch Mats? I'm not really sure what state would
> be useful to dump.

Not really.  The important thing is the URL (from nsPresContext::Document()),
hopefully that will give us some clues.  From the shell (if it's non-null),
the value of IsDestroying(), IsFrozen(), IsActive(), IsVisible(),
IsNeverPainting(), and GetRootFrame().
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Created attachment 8348288 [details] [diff] [review]
Dump pres context data

Something like this?
Attachment #8348288 - Flags: review?(matspal)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 90

4 years ago
Comment on attachment 8348288 [details] [diff] [review]
Dump pres context data

LGTM, r=mats
Attachment #8348288 - Flags: review?(matspal) → review+
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
https://hg.mozilla.org/integration/mozilla-inbound/rev/c867eb65bd5e
Whiteboard: [leave open]
Comment hidden (Treeherder Robot)
(Reporter)

Comment 100

4 years ago
(In reply to TBPL Robot from comment #99)

This is from a run after comment 98 landed on inbound.
Flags: needinfo?(matt.woodrow)
Comment hidden (Treeherder Robot)
https://hg.mozilla.org/mozilla-central/rev/c867eb65bd5e
(Assignee)

Comment 103

4 years ago
We forgot to add "shell" to the arg list, or remove the %p, but nevermind,
after shifting the values I think we have this:
IsDestroying(0) IsFrozen(0) IsActive(1) IsVisible(1) \
  IsNeverPainting(0) GetRootFrame(non-null)
for all three shells.

The URLs are:
  .../devtools/framework/toolbox.xul 
    .../devtools/debugger.xul 
      data:text/html...

The data: URL contains ".CodeMirror { width: 100%; height: 100% !important;
line-height: normal!important}" so it probably comes from this:
http://mxr.mozilla.org/mozilla-central/source/browser/devtools/sourceeditor/editor.js#53
CM_IFRAME is only used here:
http://mxr.mozilla.org/mozilla-central/source/browser/devtools/sourceeditor/editor.js#271
which appears to be something that injects an <iframe> into the content page for some
devtools purpose.

The last two test messages in the log:
... Waiting for event: 'Debugger:Shutdown' on [object ChromeWindow].
... Waiting for event: 'destroyed' on [object Object].
indicates that we're are here:
 resumeDebuggerThenCloseAndFinish
   closeDebuggerAndFinish
     teardown
http://mxr.mozilla.org/mozilla-central/source/browser/devtools/debugger/test/head.js#499
the resumeDebuggerThenCloseAndFinish call comes from the test:
http://mxr.mozilla.org/mozilla-central/source/browser/devtools/debugger/test/browser_dbg_variables-view-popup-05.js
(Assignee)

Comment 104

4 years ago
It seems to me that we should prevent further paint events before starting
to detach the pres context, maybe by setting the shell's IsActive(0) or
IsVisible(0) or something like that.  I'm not sure what those flags are
for exactly.

What will help us debug this further?  Would the JS stack be of any use?
Perhaps dump some nsDocumentViewer/nsDocShell state?
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Adding ni?roc for Mats' questions in Comment 104.
Flags: needinfo?(matt.woodrow) → needinfo?(roc)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
It looks to me like in NotifyDidPaintSubdocumentCallback, the call to NotifyDidPaintForSubtree works fine but results in the destruction of the prescontext, causing the crash on the next line where we call IsDOMPaintEventPending(). So I think pc should be a strong reference or else we should re-get it from the document. I prefer the latter.
Flags: needinfo?(roc)
Comment hidden (Treeherder Robot)
Hmm, but it shouldn't be able to destroy the prescontext unless scriptrunners run synchronously, and there is an nsAutoScriptBlocker on the stack in nsViewManager::ProcessPendingUpdatesForView. How about instrumenting prescontext destruction and NotifyDidPaintSubdocumentCallback to confirm that the prescontext really is being destroyed between those two lines? If that's happening, we can incrementally narrow down the destruction point using more instrumentation.
Comment hidden (Treeherder Robot)
(Assignee)

Comment 130

4 years ago
Created attachment 8351731 [details] [diff] [review]
Diagnostic code - part 2

This also dumps the PresContexts on the call stack.
And any destructor calls inside a NotifyDidPaint* method.
I also made the ancestor traversal in IsDOMPaintEventPending try the
parent document's shell's context, like GetDisplayRootPresContext() does:
http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresContext.cpp#1299
Attachment #8351731 - Flags: review?(roc)
Attachment #8351731 - Flags: review?(roc) → review+
(Assignee)

Comment 131

4 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/1fd44cab7b14
(Assignee)

Comment 132

4 years ago
I just looked through all of the reported failures so far and they are *all* from
OSX 10.8 Opt builds.
Comment hidden (Treeherder Robot)
https://hg.mozilla.org/mozilla-central/rev/1fd44cab7b14
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 139

4 years ago
Created attachment 8355668 [details] [diff] [review]
Part 1, Don't schedule paint related events in a detached pres context.

After a few dozen Try jobs I think I finally have figured this out.
There are two separate bugs, the first one is that we call
FireDOMPaintEvent on detached pres contexts, this happens when:
 1. EnsureEventualDidPaintEvent is called and while the timer
    (mNotifyDidPaintTimer) is running the pres context is detached
 2. NotifyDidPaintForSubtree is called and a DelayedFireDOMPaintEvent
    script runner created, but before it's executed the pres context
    is detached

The fix for 1 is to call CancelDidPaintTimer() as part of detaching
the pres context.  The fix for 2 is to check that the pres context
is still attached in the Run() method of the script runner.

That's the first bug.  The second one was a lot harder to find...
Assignee: nobody → matspal
Attachment #8355668 - Flags: review?(tnikkel)
(Assignee)

Updated

4 years ago
Attachment #8351731 - Attachment description: part 2 → Diagnostic code - part 2
(Assignee)

Comment 140

4 years ago
Created attachment 8355670 [details] [diff] [review]
Part 2, Ignore updates for detached views.

The second bug is that we process updates for detached views.
(Note that this is an independent concept of "detached" than the
one above.)  These are SubDocumentFrame child views that are
in limbo on the frame loader -- see nsSubDocumentFrame::DestroyFrom,
BeginSwapDocShellsForViews for details.  There is a delay until
the nsHideViewer script runner runs and we need to ignore these
views in ProcessPendingUpdatesForView in the meantime.
The fix here is to mark those views with a flag.

With these two fixes, there were no crashes in 300 runs or so:
https://tbpl.mozilla.org/?tree=Try&rev=90093d2701f6
(it would typically crash once per 30 runs or so)
Attachment #8355670 - Flags: review?(tnikkel)
(Assignee)

Comment 141

4 years ago
Regarding the comment in the first patch:
+  // XXXmats maybe also CancelApplyPluginGeometryTimer(); ?

We had some crashes a year or two ago where we got a null root pres context
while processing plugin geometry updates.  We added null-checks to wallpaper
over it but I seem to recall we didn't fully understand why it occurred.
Cancelling this timer *might* be the proper fix.
(Assignee)

Comment 142

4 years ago
BTW, I'm intentionally not removing the diagnostic code yet, just in case
I missed something.  I'll remove that after this has baked for a few weeks
without any incidents.
Comment on attachment 8355670 [details] [diff] [review]
Part 2, Ignore updates for detached views.

Any overlap between this and IsNeverPainting? Should we also bail in Refresh like we do for IsNeverPainting? Can we combine them into one check?
Comment hidden (Treeherder Robot)
(Assignee)

Comment 145

4 years ago
Created attachment 8356130 [details] [diff] [review]
Part 2, Disable painting for the shell while the child views are detached.

That's a good idea, thanks.

https://tbpl.mozilla.org/?tree=Try&rev=453fbc61f425
https://tbpl.mozilla.org/?tree=Try&rev=aa2c9a8d27f7
https://tbpl.mozilla.org/?tree=Try&rev=78dc3746cee8
Attachment #8355670 - Attachment is obsolete: true
Attachment #8355670 - Flags: review?(tnikkel)
Attachment #8356130 - Flags: review?(tnikkel)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)

Updated

4 years ago
Summary: Intermittent browser_dbg_variables-view-popup-05.js | application terminated with exit code 256 | application crashed [@ nsPresContext::IsDOMPaintEventPending()] → Intermittent browser_dbg_variables-view-popup-05.js | application crashed [@ nsPresContext::IsDOMPaintEventPending()]
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(In reply to Mats Palmgren (:mats) from comment #140)
> The second bug is that we process updates for detached views.
> (Note that this is an independent concept of "detached" than the
> one above.)  These are SubDocumentFrame child views that are
> in limbo on the frame loader -- see nsSubDocumentFrame::DestroyFrom,
> BeginSwapDocShellsForViews for details.  There is a delay until
> the nsHideViewer script runner runs and we need to ignore these
> views in ProcessPendingUpdatesForView in the meantime.
> The fix here is to mark those views with a flag.

What is the stack that gets us into ProcessPendingUpdatesForView for a detached view?
(In reply to Mats Palmgren (:mats) from comment #139)
> Created attachment 8355668 [details] [diff] [review]
> Part 1, Don't schedule paint related events in a detached pres context.
> 
> After a few dozen Try jobs I think I finally have figured this out.
> There are two separate bugs, the first one is that we call
> FireDOMPaintEvent on detached pres contexts, this happens when:
>  1. EnsureEventualDidPaintEvent is called and while the timer
>     (mNotifyDidPaintTimer) is running the pres context is detached
>  2. NotifyDidPaintForSubtree is called and a DelayedFireDOMPaintEvent
>     script runner created, but before it's executed the pres context
>     is detached
> 
> The fix for 1 is to call CancelDidPaintTimer() as part of detaching
> the pres context.  The fix for 2 is to check that the pres context
> is still attached in the Run() method of the script runner.
> 
> That's the first bug.  The second one was a lot harder to find...

I see that we use EnumerateSubdocuments on the documents to fire these paint events, the subdocument tree as kept by the nsIDocument and the view tree are not always in sync (as you found out). We have usually used the frame and by extension the view tree as the source of truth for painting relating things, as that is what actually determines what gets painted.

GetDisplayRootPresContext seems like an oddball to me as it seems to just duplicate GetRootPresContext except with a weird jump to nsIDocuments. It has added in http://hg.mozilla.org/mozilla-central/rev/79cd5721b7b4 with the message "Make GetParentPresContext() succeed when the current PresContext has no frames". Since then GetParentPresContext has been changed to deal with null root frames. So it would be nice to get rid of GetDisplayRootPresContext and just use GetRootPresContext.

Similarly could we get rid of traversing the nsIDocument hierarchy and the associated woes and just traverse views maybe?
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 213

4 years ago
(In reply to Timothy Nikkel (:tn) from comment #192)
> GetDisplayRootPresContext seems like an oddball to me as it seems to just
> duplicate GetRootPresContext except with a weird jump to nsIDocuments.

Yeah, I don't know what it's for to be honest.

> Similarly could we get rid of traversing the nsIDocument hierarchy and the
> associated woes and just traverse views maybe?

I don't want to get into architectural changes in this bug though.
And I think most changes in the current patches, if not all, would still
apply after such architectural changes, so I'd like to land them as is.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)

Updated

4 years ago
Summary: Intermittent browser_dbg_variables-view-popup-05.js | application crashed [@ nsPresContext::IsDOMPaintEventPending()] → Intermittent browser_dbg_variables-view-popup-05.js,browser_dbg_variables-view-popup-06.js | application crashed [@ nsPresContext::IsDOMPaintEventPending()]
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 250

4 years ago
Comment on attachment 8355668 [details] [diff] [review]
Part 1, Don't schedule paint related events in a detached pres context.

roc, as I said in comment 213,

I don't want to get into architectural changes in this bug though.
And I think most changes in the current patches, if not all, would still
apply after such architectural changes, so I'd like to land them as is.
(I'm also hoping that the fixes will help fix some of our nastier
crashes during painting that we have reports on)
Attachment #8355668 - Flags: review?(tnikkel) → review?(roc)
(Assignee)

Updated

4 years ago
Attachment #8356130 - Flags: review?(tnikkel) → review?(roc)
Attachment #8355668 - Flags: review?(roc) → review+
Attachment #8356130 - Flags: review?(roc) → review+
(Assignee)

Comment 251

4 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/ae57a2bfa2c7
https://hg.mozilla.org/integration/mozilla-inbound/rev/5bcdeb39c890
OS: Mac OS X → All
Hardware: x86_64 → All
Whiteboard: [leave open]
(Assignee)

Comment 252

4 years ago
(Filed bug 967296 for removing the diagnostic code after this test has been green for a while)
(Reporter)

Comment 253

4 years ago
https://hg.mozilla.org/mozilla-central/rev/ae57a2bfa2c7
https://hg.mozilla.org/mozilla-central/rev/5bcdeb39c890
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
(Reporter)

Updated

4 years ago
Depends on: 967758
Comment hidden (Treeherder Robot)
(Reporter)

Comment 255

4 years ago
I assume we don't want to land these patches on the release branches directly, but it would be good to get the fixes for this backported when we have a proper fix in hand.
status-b2g-v1.3: --- → affected
status-firefox28: --- → affected
status-firefox29: --- → affected
status-firefox30: --- → fixed
status-firefox-esr24: --- → unaffected
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Reporter)

Comment 259

4 years ago
Created attachment 8373322 [details] [diff] [review]
part 1 for branches

This is Part 1 with the patches from bug 967758 and bug 968174 folded in for hopeful uplift. Applies fine to Aurora and Beta.

Part 2 also applies without issue.
Attachment #8373322 - Flags: review?(matspal)
(Assignee)

Comment 260

4 years ago
Comment on attachment 8373322 [details] [diff] [review]
part 1 for branches

Thanks.  The changes here are a bit risky though so we probably don't
want them on Beta.
Attachment #8373322 - Flags: review?(matspal) → review+
(Reporter)

Comment 261

4 years ago
Which means I'm going to be starring this on b2g28 until at least September :(
(Reporter)

Comment 262

4 years ago
Please do request uplift approval when you're ready, though.
(Assignee)

Comment 263

4 years ago
Comment on attachment 8373322 [details] [diff] [review]
part 1 for branches

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: crashes (rare)
Testing completed (on m-c, etc.): on m-c since 2014-02-04
Risk to taking this patch (and alternatives if risky): medium
String or IDL/UUID changes made by this patch: none
Attachment #8373322 - Flags: approval-mozilla-aurora?
(Reporter)

Comment 264

4 years ago
Can we consider letting this bake on trunk/Aurora for a week or two and then request Beta uplift if all looks good?
Attachment #8373322 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
the sooner in aurora, the better. It will get other kind of testers in this branch.
(Reporter)

Comment 266

4 years ago
https://hg.mozilla.org/releases/mozilla-aurora/rev/0112f6fba74b
https://hg.mozilla.org/releases/mozilla-aurora/rev/df5f5cd12cc0
status-b2g-v1.4: --- → fixed
status-firefox29: affected → fixed
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Reporter)

Comment 277

4 years ago
Mats, this has been on trunk for nearly 2 weeks now and Aurora for a week with no known issues. All of the failures since this landed on Aurora have been on 28-based branches. Given that, are you more comfortable with the idea of getting this on 28? We would still a solid month's worth of testing on beta at this point in the release cycle.
Flags: needinfo?(matspal)
(Assignee)

Comment 278

4 years ago
Not really, I still think the risk/benefit ratio is too high for Beta.
I have sympathy for your burden of starring this for a long time though...
Can we simply disable this test on 28-branches?
Flags: needinfo?(matspal)
(Reporter)

Comment 279

4 years ago
Yup, that was exactly my plan B :-)
(Reporter)

Comment 280

4 years ago
https://hg.mozilla.org/releases/mozilla-beta/rev/533e8bcf1404
status-firefox28: affected → disabled
(Reporter)

Comment 281

4 years ago
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/533e8bcf1404
(Reporter)

Updated

4 years ago
status-b2g-v1.3: affected → disabled
You need to log in before you can comment on or make changes to this bug.