Closed Bug 886999 Opened 11 years ago Closed 11 years ago

I10t 30885-1.html,752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js,test_nested_eventloop.html,test_bug509244.html,test_showModalDialog.html,869038.html | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62]

Categories

(Core :: Graphics: Layers, defect)

x86_64
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
firefox26 --- disabled
firefox27 --- fixed
firefox-esr24 --- unaffected
b2g-v1.2 --- disabled

People

(Reporter: RyanVM, Assigned: mstange)

References

Details

(Keywords: crash, intermittent-failure, Whiteboard: [qa-])

Crash Data

Attachments

(3 files, 3 obsolete files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=24568436&tree=Mozilla-Inbound

Rev5 MacOSX Mountain Lion 10.8 mozilla-inbound opt test crashtest on 2013-06-25 08:02:47 PDT for push 2c9674fbea6a
slave: talos-mtnlion-r5-048

08:09:35     INFO -  REFTEST TEST-START | file:///builds/slave/talos-slave/test/build/tests/reftest/tests/dom/plugins/test/crashtests/752340.html
08:09:35     INFO -  REFTEST TEST-LOAD | file:///builds/slave/talos-slave/test/build/tests/reftest/tests/dom/plugins/test/crashtests/752340.html | 2400 / 2485 (96%)
08:09:35     INFO -  2013-06-25 08:09:35.298 firefox-bin[1092:9707] invalid context
08:09:35     INFO -  2013-06-25 08:09:35.300 firefox-bin[1092:9707] invalid context
08:09:35     INFO -  2013-06-25 08:09:35.301 firefox-bin[1092:9707] invalid context
08:09:44  WARNING -  TEST-UNEXPECTED-FAIL | file:///builds/slave/talos-slave/test/build/tests/reftest/tests/dom/plugins/test/crashtests/752340.html | Exited with code 1 during test run
08:09:44     INFO -  INFO | automation.py | Application ran for: 0:04:37.864573
08:09:44     INFO -  INFO | zombiecheck | Reading PID log: /var/folders/0x/4q6p61sd48v3k3ndq9mwpp3400000w/T/tmpuIyWrdpidlog
08:09:44     INFO -  mozcrash INFO | Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-macosx64/1372167785/firefox-25.0a1.en-US.mac.crashreporter-symbols.zip
08:09:44     INFO -  Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-macosx64/1372167785/firefox-25.0a1.en-US.mac.crashreporter-symbols.zip
08:10:27  WARNING -  PROCESS-CRASH | file:///builds/slave/talos-slave/test/build/tests/reftest/tests/dom/plugins/test/crashtests/752340.html | application crashed [@ OpenGL + 0x3317]
08:10:27     INFO -  Crash dump filename: /var/folders/0x/4q6p61sd48v3k3ndq9mwpp3400000w/T/tmpTP0nyc/minidumps/FD37963C-505C-4FC2-844A-B5DBDA1BE197.dmp
08:10:27     INFO -  Operating system: Mac OS X
08:10:27     INFO -                    10.8.0 12A269
08:10:27     INFO -  CPU: amd64
08:10:27     INFO -       family 6 model 42 stepping 7
08:10:27     INFO -       8 CPUs
08:10:27     INFO -  Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
08:10:27     INFO -  Crash address: 0x4dcfdf00
08:10:27     INFO -  Thread 17 (crashed)
08:10:27     INFO -   0  OpenGL + 0x3317
08:10:27     INFO -      rbx = 0x000000014dcfc000   r12 = 0x000000011e3e5400
08:10:27     INFO -      r13 = 0x000000010eded500   r14 = 0x000000010d580948
08:10:27     INFO -      r15 = 0x000000014dcfc000   rip = 0x00007fff966e3317
08:10:27     INFO -      rsp = 0x000000010d5808f0   rbp = 0x000000010d580900
08:10:27     INFO -      Found by: given as instruction pointer in context
08:10:27     INFO -   1  OpenGL + 0xc269
08:10:27     INFO -      rip = 0x00007fff966ec26a   rsp = 0x000000010d580910
08:10:27     INFO -      rbp = 0x000000010d580930
08:10:27     INFO -      Found by: stack scanning
08:10:27     INFO -   2  AppKit + 0x68ae0
08:10:27     INFO -      rip = 0x00007fff959c8ae1   rsp = 0x000000010d580940
08:10:27     INFO -      rbp = 0x000000010d580950
08:10:27     INFO -      Found by: stack scanning
08:10:27     INFO -   3  XUL!mozilla::gl::GLContextCGL::MakeCurrentImpl(bool) [GLContextProviderCGL.mm:2c9674fbea6a : 133 + 0x13]
08:10:27     INFO -      rip = 0x0000000102ac5d41   rsp = 0x000000010d580960
08:10:27     INFO -      rbp = 0x000000010d580980
08:10:27     INFO -      Found by: stack scanning
08:10:27     INFO -   4  XUL!mozilla::layers::CompositorOGL::MakeCurrent(unsigned int) [GLContext.h:2c9674fbea6a : 189 + 0x2]
08:10:27     INFO -      rbx = 0x0000000000000320   r14 = 0x00000000000003e8
08:10:27     INFO -      rip = 0x0000000102a6cb87   rsp = 0x000000010d580990
08:10:27     INFO -      rbp = 0x000000010d580990
08:10:27     INFO -      Found by: call frame info
08:10:27     INFO -   5  XUL!mozilla::layers::CompositorOGL::BeginFrame(mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*, gfxMatrix const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*) [CompositorOGL.cpp:2c9674fbea6a : 832 + 0xb]
08:10:27     INFO -      rbx = 0x0000000000000320   r14 = 0x00000000000003e8
08:10:27     INFO -      rip = 0x0000000102a6a631   rsp = 0x000000010d5809a0
08:10:27     INFO -      rbp = 0x000000010d580a40
08:10:27     INFO -      Found by: call frame info
Very similar to bug 863313.
Crash Signature: [@ OpenGL@0x3317]
See Also: → 887360, 887361
https://tbpl.mozilla.org/php/getParsedLog.php?id=24795766&tree=Fx-Team
Summary: Intermittent dom/plugins/test/crashtests/752340.html | Exited with code 1 during test run | application crashed [@ OpenGL + 0x3317] → Intermittent 752340.html,browser_social_errorPage.js | Exited with code 1 during test run | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08]
https://tbpl.mozilla.org/php/getParsedLog.php?id=24878784&tree=Mozilla-Inbound
Summary: Intermittent 752340.html,browser_social_errorPage.js | Exited with code 1 during test run | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08] → Intermittent 752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js | Exited with code 1 during test run | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62]
Summary: Intermittent 752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js | Exited with code 1 during test run | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62] → Intermittent 30885-1.html,752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js | Exited with code 1 during test run | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62]
Assignee: nobody → milan
https://tbpl.mozilla.org/php/getParsedLog.php?id=26507267&tree=Fx-Team
Summary: Intermittent 30885-1.html,752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js | Exited with code 1 during test run | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62] → Intermittent 30885-1.html,752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js,test_nested_eventloop.html | Exited with code 1 during test run | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62]
Assignee: milan → gabadie
I tried to find similarities between logs, and here is the result:

------------- Mac OS X 10.7.2:
 - GPU: NVIDIA GeForce 320M
 - CPU: Intel core 2 duo
 - crash triggered by /tests/docshell/test/navigation/test_sessionhistory.html
     https://tbpl.mozilla.org/php/getParsedLog.php?id=25326294&full=1&branch=mozilla-aurora#error3
 - crash in /tests/dom/tests/mochitest/bugs/test_bug346659.html
     https://tbpl.mozilla.org/php/getParsedLog.php?id=25676472&full=1&branch=mozilla-inbound#error3
 - crash in file:///Users/cltbld/talos-slave/test/build/tests/reftest/tests/parser/htmlparser/tests/crashtests/30885-1.html
     https://tbpl.mozilla.org/php/getParsedLog.php?id=25916400&full=1&branch=mozilla-inbound#error1
     https://tbpl.mozilla.org/php/getParsedLog.php?id=25692465&full=1&branch=mozilla-inbound#error3

------------- Mac OS X 10.8.0:
 - GPU: Intel HD Graphics 3000
 - CPU: Intel i7
 - crash in /tests/content/media/webspeech/recognition/test/test_nested_eventloop.html
     stack: mozilla::layers::CompositorOGL::BeginFrame
        https://tbpl.mozilla.org/php/getParsedLog.php?id=26491296&full=1&branch=b2g-inbound#error1
        https://tbpl.mozilla.org/php/getParsedLog.php?id=26507267&full=1&branch=fx-team#error1
        https://tbpl.mozilla.org/php/getParsedLog.php?id=25781733&full=1&branch=fx-team#error1
     stack: XUL!mozilla::gl::BasicTextureImage::~BasicTextureImage()
        https://tbpl.mozilla.org/php/getParsedLog.php?id=26500503&full=1&branch=mozilla-inbound#error1

The interesting thing is that only /tests/content/media/webspeech/recognition/test/test_nested_eventloop.html triggers a crash in 10.8.

But, I was not able to reproduce any of those crashes locally.
Summary: Intermittent 30885-1.html,752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js,test_nested_eventloop.html | Exited with code 1 during test run | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62] → Intermittent 30885-1.html,752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js,test_nested_eventloop.html | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62]
https://tbpl.mozilla.org/php/getParsedLog.php?id=26859165&tree=Mozilla-Central
Summary: Intermittent 30885-1.html,752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js,test_nested_eventloop.html | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62] → Intermittent 30885-1.html,752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js,test_nested_eventloop.html,test_bug509244.html | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62]
Assignee: guillaume.abadie → nobody
Looks like this needs an owner again....
Flags: needinfo?(milan)
We do get an "invalid context" and "invalid drawable" messages around the failing test, but that happens even when the test doesn't fail.
Milan, any luck with this?
Summary: Intermittent 30885-1.html,752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js,test_nested_eventloop.html,test_bug509244.html | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62] → Intermittent 30885-1.html,752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js,test_nested_eventloop.html,test_bug509244.html,test_showModalDialog.html | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62]
Milan - From last week's engineering meeting, you were responsible for finding an owner for this bug. Any progress?
I was looking at it myself, but nothing to report yet.
Flags: needinfo?(milan)
Assignee: nobody → milan
Assignee: milan → jmuizelaar
I expect that this was likely caused by OMTC.
Highly likely given when it started and that it only happens on m-c/aurora.
Assignee: jmuizelaar → matt.woodrow
I expect this is us being sloppy about widget's interaction with the compositor similar to Bug 890997
A couple of things seem possible:

The apple OpenGL concurrency guide suggests that we need to lock the GLContext when using it off the main thread - "When you use an NSOpenGLView object with OpenGL calls that are issued from a thread other than the main one, you must set up mutex locking. Mutex locking is necessary because unless you override the default behavior, the main thread may need to communicate with the view for such things as resizing." (https://developer.apple.com/library/mac/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_threading/opengl_threading.html#//apple_ref/doc/uid/TP40001987-CH409-SW1).

lockFocus does setView and makeContextCurrent from the main thread (http://mxr.mozilla.org/mozilla-central/source/widget/cocoa/nsChildView.mm#3215).

surfaceNeedsUpdate does an update from the main thread (http://mxr.mozilla.org/mozilla-central/source/widget/cocoa/nsChildView.mm#3239)

Marcus, do any of these seem like possible causes of this issue?

I don't know the cocoa widget too well, but I can't see any other differences between main-thread and OMTC.
Attached patch Cocoa widget changes (obsolete) — — Splinter Review
Thoughts?

This doesn't seem to break anything (but I only briefly tested it), no idea if it affects the intermittent bug in question.
Attachment #809674 - Flags: review?(mstange)
(In reply to Matt Woodrow (:mattwoodrow) from comment #455)
> lockFocus does setView and makeContextCurrent from the main thread
> (http://mxr.mozilla.org/mozilla-central/source/widget/cocoa/nsChildView.
> mm#3215).

I wonder why it does that. Do you know? It looks like you added this in bug 561957.

> surfaceNeedsUpdate does an update from the main thread
> (http://mxr.mozilla.org/mozilla-central/source/widget/cocoa/nsChildView.
> mm#3239)

Should we add a CGLLockContext around this update call? Or does update lock internally already?

> Marcus, do any of these seem like possible causes of this issue?

It's possible, sure. I don't know much more than you about this, unfortunately.
Comment on attachment 809674 [details] [diff] [review]
Cocoa widget changes

Looks sensible. But definitely give it a Tryserver run first.
Attachment #809674 - Flags: review?(mstange) → review+
(In reply to Markus Stange [:mstange] from comment #466)
> Looks sensible. But definitely give it a Tryserver run first.

https://tbpl.mozilla.org/?tree=Try&rev=f08a417df540 - wasn't quite quick enough to beat the midday backlog of Mac tests, though.
How many green C's do we want before we're confident this helps?
Looks good enough to me to say that it's worth pushing to inbound with a [leave open] on the whiteboard so we can see how it does there for a couple days.
Looks like comment 476 might have reduced the frequency, but it hasn't eliminated it either.
Backed the patch out again for causing deadlocks (bug 920979) 

https://hg.mozilla.org/integration/mozilla-inbound/rev/978f9f45d5a1

I suspect this is probably because of the locking it added, in combination with us doing synchronous transactions sometimes.
(In reply to Matt Woodrow (:mattwoodrow) from comment #493)
> Backed the patch out again for causing deadlocks (bug 920979) 
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/978f9f45d5a1
> 
> I suspect this is probably because of the locking it added, in combination
> with us doing synchronous transactions sometimes.

Backout merged:
https://hg.mozilla.org/mozilla-central/rev/978f9f45d5a1
More tests affected... crashtests/869038.html
Summary: Intermittent 30885-1.html,752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js,test_nested_eventloop.html,test_bug509244.html,test_showModalDialog.html | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62] → I10t 30885-1.html,752340.html,browser_social_errorPage.js,browser_tabview_bug595930.js,test_nested_eventloop.html,test_bug509244.html,test_showModalDialog.html,869038.html | application crashed [@ OpenGL + 0x3317][@ OpenGL + 0x7e08][@ OpenGL + 0x2a62]
Attached patch conservative-setview-update — — Splinter Review
We used to call setView and update on the GL context each time we composite, but I think that's unnecessary. This patch makes us only do it the first time we draw with a new context, and when our view size changes and after the clearDrawable call in forceRefreshOpenGL.
Assignee: matt.woodrow → mstange
Attachment #809674 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #813719 - Flags: review?(matt.woodrow)
Attached patch add-glcontext-locking — — Splinter Review
Similar to your patch, but with an additional call to postRender in nsChildView::DoRemoteComposition for basic OMTC, and without the lockFocus stuff. This locks around all setView/update calls which seems to fix the 10.6 OMTC fullscreen + forceRefreshOpenGL crashes.
Attachment #813720 - Flags: review?(matt.woodrow)
Attachment #813719 - Flags: review?(matt.woodrow) → review+
Comment on attachment 813720 [details] [diff] [review]
add-glcontext-locking

Review of attachment 813720 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good! The original version of this was backed out, looked like it was causing deadlocks and making our tests time out. Any reason to suspect that it won't happen again with this patch?

::: widget/cocoa/nsChildView.mm
@@ +2029,5 @@
>    }
>    NSOpenGLContext *glContext = (NSOpenGLContext *)manager->gl()->GetNativeData(GLContext::NativeGLContext);
>    return [(ChildView*)mView preRender:glContext];
>  }
> +  

Whitespace!
Attachment #813720 - Flags: review?(matt.woodrow) → review+
(In reply to Matt Woodrow (:mattwoodrow) from comment #528)
> Looks good! The original version of this was backed out, looked like it was
> causing deadlocks and making our tests time out. Any reason to suspect that
> it won't happen again with this patch?

Actually, it seems this patch does have the same problem, and I had ignored it. I'll try to reproduce it locally. It occurred on this try push in the first 10.8 opt "oth" run: https://tbpl.mozilla.org/?tree=Try&rev=44e71a747a82
Attached patch lock-before-viewteardown (obsolete) — — Splinter Review
That was actually really easy to reproduce (I used "mach mochitest-chrome content/events/test/test_bug617528.xul --repeat 200").

The problem is that nsChildView::Destroy is called between PreRender and PostRender, and it calls TearDownView which nulls out mView, so we can't call postRender on it after the composition.
Attachment #813815 - Flags: review?(matt.woodrow)
Attached patch lock-before-viewteardown (obsolete) — — Splinter Review
Simpler patch that does the same thing.

I'm a bit surprised that the widget is destroyed before composition has finished. I suppose the DestroyCompositor call in the destructor of nsBaseWidget blocks on finishing composition? Maybe we should move compositor destruction into nsBaseWidget::Destroy instead so we could delay nulling out mView until after that has happened?
Attachment #813815 - Attachment is obsolete: true
Attachment #813815 - Flags: review?(matt.woodrow)
Attachment #813863 - Flags: review?(matt.woodrow)
Comment on attachment 813863 [details] [diff] [review]
lock-before-viewteardown

Review of attachment 813863 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/cocoa/nsChildView.mm
@@ +2979,5 @@
> +    // If our GL context is still locked on another thread, wait for that
> +    // thread to unlock it before detaching ourselves from the nsChildView.
> +    // Otherwise the nsChildView doesn't have a chance to call postRender on us.
> +    CGLLockContext((CGLContextObj)[mGLContext CGLContextObj]);
> +    CGLUnlockContext((CGLContextObj)[mGLContext CGLContextObj]);

Don't we need to unlock after we detach?

Otherwise we could potentially get preempted after the unlock, but before the detach, and then have the compositor thread hit preRender.
Good point!
Attached patch lock-around-viewteardown — — Splinter Review
Maybe like this?
Attachment #813863 - Attachment is obsolete: true
Attachment #813863 - Flags: review?(matt.woodrow)
Attachment #814079 - Flags: review?(matt.woodrow)
Attachment #814079 - Flags: review?(matt.woodrow) → review+
Can we please get this landed?
Flags: needinfo?(mstange)
I attempted to land it yesterday, but it caused timeouts in
/tests/dom/browser-element/mochitest/test_browserElement_oop_CloseFromOpener.html
so you backed it out again.

Example log:
https://tbpl.mozilla.org/php/getParsedLog.php?id=28883670&tree=Mozilla-Inbound&full=1#error0
Flags: needinfo?(mstange)
Depends on: 924771
You're unblocked, land away.
The last reports are aurora/beta only, so it looks like this is fixed!
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
(In reply to Markus Stange [:mstange] from comment #741)
> The last reports are aurora/beta only, so it looks like this is fixed!

What needs uplifting where at this point? Can you please handle the approval requests?
Flags: needinfo?(mstange)
Target Milestone: --- → mozilla27
Whiteboard: [qa-]
Disabled the crash-prone tests on beta.

https://hg.mozilla.org/releases/mozilla-beta/rev/5182944d4232
Flags: needinfo?(mstange)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: