Closed
Bug 1145295
Opened 9 years ago
Closed 9 years ago
Intermittent test_persistence.xul,test_bug583948.xul | Assertion failure: aTransactionId > mPendingTransaction, at gfx\layers\ipc/CompositorParent.cpp:1124
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
RESOLVED
FIXED
mozilla42
People
(Reporter: RyanVM, Assigned: kats)
References
Details
(Keywords: assertion, crash, intermittent-failure, Whiteboard: [gfx-noted])
Attachments
(3 files)
1.05 KB,
patch
|
nical
:
review+
|
Details | Diff | Splinter Review |
1.48 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
1.17 KB,
patch
|
lmandel
:
approval-mozilla-aurora+
lmandel
:
approval-mozilla-beta+
lmandel
:
approval-mozilla-esr38+
|
Details | Diff | Splinter Review |
07:59:32 INFO - 1522 INFO TEST-START | dom/xul/test/test_bug583948.xul 07:59:32 INFO - : file c:\builds\moz2_slave\m-cen-w32-d-000000000000000000\build\src\gfx\layers\ipc/CompositorParent.cpp, line 484 07:59:33 INFO - Assertion failure: aTransactionId > mPendingTransaction, at c:\builds\moz2_slave\m-cen-w32-d-000000000000000000\build\src\gfx\layers\ipc/CompositorParent.cpp:1124 07:59:33 INFO - #01: mozilla::layers::LayerTransactionParent::RecvUpdate(nsTArray<mozilla::layers::Edit> &&,unsigned __int64 const &,mozilla::layers::TargetConfig const &,nsTArray<mozilla::layers::PluginWindowData> &&,bool const &,bool const &,unsigned int const &,bool const &,mozilla::TimeStamp const &,nsTArray<mozilla::layers::EditReply> *) [gfx/layers/ipc/LayerTransactionParent.cpp:592] 07:59:33 INFO - #02: mozilla::layers::LayerTransactionParent::RecvUpdateNoSwap(nsTArray<mozilla::layers::Edit> &&,unsigned __int64 const &,mozilla::layers::TargetConfig const &,nsTArray<mozilla::layers::PluginWindowData> &&,bool const &,bool const &,unsigned int const &,bool const &,mozilla::TimeStamp const &) [gfx/layers/ipc/LayerTransactionParent.cpp:206] 07:59:33 INFO - #03: mozilla::layers::PLayerTransactionParent::OnMessageReceived(IPC::Message const &) [obj-firefox/ipc/ipdl/PLayerTransactionParent.cpp:517] 07:59:33 INFO - #04: mozilla::layers::PCompositorParent::OnMessageReceived(IPC::Message const &) [obj-firefox/ipc/ipdl/PCompositorParent.cpp:483] 07:59:33 INFO - #05: mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const &) [ipc/glue/MessageChannel.cpp:1226] 07:59:33 INFO - #06: mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message const &) [ipc/glue/MessageChannel.cpp:1152] 07:59:33 INFO - #07: mozilla::ipc::MessageChannel::OnMaybeDequeueOne() [ipc/glue/MessageChannel.cpp:1136] 07:59:33 INFO - #08: MessageLoop::RunTask(Task *) [ipc/chromium/src/base/message_loop.cc:362] 07:59:33 INFO - #09: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const &) [ipc/chromium/src/base/message_loop.cc:372] 07:59:33 INFO - #10: MessageLoop::DoWork() [ipc/chromium/src/base/message_loop.cc:456] 07:59:33 INFO - #11: base::MessagePumpForUI::DoRunLoop() [ipc/chromium/src/base/message_pump_win.cc:217] 07:59:33 INFO - #12: base::MessagePumpWin::Run(base::MessagePump::Delegate *) [ipc/chromium/src/base/message_pump_win.h:78] 07:59:33 INFO - #13: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:233] 07:59:33 INFO - #14: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:227] 07:59:33 INFO - #15: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:201] 07:59:33 INFO - #16: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:173] 07:59:33 INFO - #17: `anonymous namespace'::ThreadFunc(void *) [ipc/chromium/src/base/platform_thread_win.cc:27] 07:59:33 INFO - #18: kernel32 + 0xb713 07:59:33 INFO - TEST-INFO | Main app process: exit status 1 07:59:33 INFO - 1523 ERROR TEST-UNEXPECTED-FAIL | dom/xul/test/test_bug583948.xul | application terminated with exit code 1
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•9 years ago
|
Component: Graphics → Graphics: Layers
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 10•9 years ago
|
||
Marking as noted. Kats or I will investigate when we get a chance.
Whiteboard: [gfx-noted]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 17•9 years ago
|
||
Interesting that this seems to be windows-only. I traced the code paths and it's not clear why this assertion would fail. I think the first step is to just get some more info - let's see if the transaction IDs are close to each other or just random junk. Logging patch attached to determine this.
Attachment #8585540 -
Flags: review?(nical.bugzilla)
Comment 18•9 years ago
|
||
Comment on attachment 8585540 [details] [diff] [review] Logging Review of attachment 8585540 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/CompositorParent.cpp @@ +1122,5 @@ > +#ifdef DEBUG > + if (aTransactionId <= mPendingTransaction) { > + // Logging added to help diagnose why we're triggering the assert below. > + // See bug 1145295 > + printf_stderr("CRASH: aTransactionId %llu <= mPendingTransaction %llu\n", nit: I think that it's more correct to write something like: printf_stderr("... aTransactionId %" PRIo64 " <= ..." using the constant that comes with #include <inttypes.h> Actually, looks like long long unsigned isn't specified to be always 64 bit, but happens to be so in practice. Whatever.
Attachment #8585540 -
Flags: review?(nical.bugzilla) → review+
Comment 19•9 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #18) > ::: gfx/layers/ipc/CompositorParent.cpp > @@ +1122,5 @@ > > +#ifdef DEBUG > > + if (aTransactionId <= mPendingTransaction) { > > + // Logging added to help diagnose why we're triggering the assert below. > > + // See bug 1145295 > > + printf_stderr("CRASH: aTransactionId %llu <= mPendingTransaction %llu\n", > > nit: I think that it's more correct to write something like: > > printf_stderr("... aTransactionId %" PRIo64 " <= ..." > > using the constant that comes with #include <inttypes.h> > > Actually, looks like long long unsigned isn't specified to be always 64 bit, > but happens to be so in practice. Whatever. The reason for using the constants in <inttypes.h> isn't that "unsigned long long" might not be 64-bit, it's that these variables are of type "uint64_t", and "uint64_t" isn't "unsigned long long" on all platforms - on some platforms, such as 64-bit Linux, it's "unsigned long". The format checker (which we've enabled for printf_stderr in bug 965022) will complain if you try to format an "unsigned long" variable with a format flag ("%llu") that is meant for "unsigned long long", even if they're the same size on the current platform.
Assignee | ||
Comment 20•9 years ago
|
||
Yeah, llu fails to build on 64-bit linux with -Werror, I replaced it with PRIu64 and verified on try. Just waiting for inbound to reopen.
Assignee | ||
Comment 21•9 years ago
|
||
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2187ff67e058 Inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/1a68731ea592
Keywords: leave-open
Assignee | ||
Updated•9 years ago
|
Attachment #8585540 -
Flags: checkin+
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 24•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/1a68731ea592
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 26•9 years ago
|
||
Don't you just love it when adding printfs makes bugs go away?
Assignee | ||
Comment 27•9 years ago
|
||
I backed out the debug log from comment 21 since this assertion failure hasn't showed up since I landed it. Either it's perma-fixed or it was fixed by the landing, I guess we'll find out!
Comment 28•9 years ago
|
||
Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/599c1081ca33
Assignee | ||
Comment 29•9 years ago
|
||
Seems to be solidly gone.
Assignee | ||
Updated•9 years ago
|
Attachment #8585540 -
Flags: checkin+
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Updated•9 years ago
|
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 36•9 years ago
|
||
Since this bug is back (although very intermittently, I'm relanding the logging patch from comment 24. I'll wait longer this time.
Assignee | ||
Updated•9 years ago
|
Keywords: leave-open
Assignee | ||
Comment 39•9 years ago
|
||
Bug 1184375 has more of these crashes. From comments 1-4 on that bug, the logging that I added reports: CRASH: aTransactionId 1 <= mPendingTransaction 1203 CRASH: aTransactionId 1 <= mPendingTransaction 1215 CRASH: aTransactionId 1 <= mPendingTransaction 1209 CRASH: aTransactionId 1 <= mPendingTransaction 1182 So it looks like the transaction id counter is getting reset on the child side somehow.
Assignee | ||
Comment 40•9 years ago
|
||
Since the refresh driver (which is the implementation of the transaction id allocator) doesn't seem to reset its counter, only increment or decrement it, that leaves two possible options: either we're getting a fresh new ClientLayerManager hooked up to the same parent-side compositor, or the ClientLayerManager's id allocator is getting reset to a different refresh driver. Matt, it looks you added this transaction ID machinery as part of bug 854421 - any ideas on what might be happening here? If not I can add some more logging and drill down.
Blocks: 854421
Flags: needinfo?(matt.woodrow)
Comment 41•9 years ago
|
||
The latter of those two options seems much more likely. If we reload/change the top level document, then we'd still have the same widget (and associated layer manager/compositor), but we'd probably get a new refresh driver.
Flags: needinfo?(matt.woodrow)
Assignee | ||
Comment 42•9 years ago
|
||
If it's a legitimate scenario, should I just update the assert to this instead? MOZ_ASSERT(aTransactionId == 1 || aTransactionId > mPendingTransaction) test_bug583948.xul does in fact open a new window and trigger a reload on it 6 times, so it seems to fit with your theory. test_persistence.xul doesn't exactly do a reload, but it does open a window twice with the same URL so it might be triggering a similar mechanism.
Flags: needinfo?(matt.woodrow)
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → bugmail.mozilla
Assignee | ||
Comment 44•9 years ago
|
||
Attachment #8634750 -
Flags: review?(matt.woodrow)
Assignee | ||
Updated•9 years ago
|
Keywords: leave-open
Updated•9 years ago
|
Attachment #8634750 -
Flags: review?(matt.woodrow) → review+
Summary: Intermittent test_bug583948.xul | Assertion failure: aTransactionId > mPendingTransaction, at gfx\layers\ipc/CompositorParent.cpp:1124 → Intermittent test_persistence.xul,test_bug583948.xul | Assertion failure: aTransactionId > mPendingTransaction, at gfx\layers\ipc/CompositorParent.cpp:1124
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 48•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/3a1822dd5a48
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
status-firefox42:
--- → fixed
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Reporter | ||
Comment 49•9 years ago
|
||
Please request Aurora/Beta/esr38 approval on this when you get a chance.
status-firefox40:
--- → affected
status-firefox41:
--- → affected
status-firefox-esr31:
--- → unaffected
status-firefox-esr38:
--- → affected
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 51•9 years ago
|
||
[Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: [Feature/regressing bug #]: code was added in bug 854421 but the assertion itself may have started firing at some later time due to other changes [User impact if declined]: none [Fix Landed on Version]: 42 m-c [Describe test coverage new/current, TreeHerder]: the code is exercised by various tests (i.e. the ones that were intermittently failing) [Risk to taking this patch (and alternatives if risky)]: no risk, this patch affects a MOZ_ASSERT which is debug-only. It's basically a fix for intermittent debug test failures. [String or UUID changes made by this patch]: none
Attachment #8635340 -
Flags: approval-mozilla-esr38?
Attachment #8635340 -
Flags: approval-mozilla-beta?
Attachment #8635340 -
Flags: approval-mozilla-aurora?
Comment 52•9 years ago
|
||
Comment on attachment 8635340 [details] [diff] [review] Patch for aurora/beta/esr38 Simple fix. Beta+ Aurora+ ESR38+
Attachment #8635340 -
Flags: approval-mozilla-esr38?
Attachment #8635340 -
Flags: approval-mozilla-esr38+
Attachment #8635340 -
Flags: approval-mozilla-beta?
Attachment #8635340 -
Flags: approval-mozilla-beta+
Attachment #8635340 -
Flags: approval-mozilla-aurora?
Attachment #8635340 -
Flags: approval-mozilla-aurora+
Updated•9 years ago
|
tracking-firefox-esr38:
--- → 40+
Reporter | ||
Comment 53•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/6c85547293b4
Reporter | ||
Comment 54•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/47bff045e3fa
Reporter | ||
Comment 55•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-esr38/rev/7a2063f186aa
You need to log in
before you can comment on or make changes to this bug.
Description
•