Closed Bug 1145295 Opened 9 years ago Closed 9 years ago

Intermittent test_persistence.xul,test_bug583948.xul | Assertion failure: aTransactionId > mPendingTransaction, at gfx\layers\ipc/CompositorParent.cpp:1124

Categories

(Core :: Graphics: Layers, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox40 --- fixed
firefox41 --- fixed
firefox42 --- fixed
firefox-esr31 --- unaffected
firefox-esr38 40+ fixed

People

(Reporter: RyanVM, Assigned: kats)

References

Details

(Keywords: assertion, crash, intermittent-failure, Whiteboard: [gfx-noted])

Attachments

(3 files)

07:59:32 INFO - 1522 INFO TEST-START | dom/xul/test/test_bug583948.xul
07:59:32 INFO - : file c:\builds\moz2_slave\m-cen-w32-d-000000000000000000\build\src\gfx\layers\ipc/CompositorParent.cpp, line 484
07:59:33 INFO - Assertion failure: aTransactionId > mPendingTransaction, at c:\builds\moz2_slave\m-cen-w32-d-000000000000000000\build\src\gfx\layers\ipc/CompositorParent.cpp:1124
07:59:33 INFO - #01: mozilla::layers::LayerTransactionParent::RecvUpdate(nsTArray<mozilla::layers::Edit> &&,unsigned __int64 const &,mozilla::layers::TargetConfig const &,nsTArray<mozilla::layers::PluginWindowData> &&,bool const &,bool const &,unsigned int const &,bool const &,mozilla::TimeStamp const &,nsTArray<mozilla::layers::EditReply> *) [gfx/layers/ipc/LayerTransactionParent.cpp:592]
07:59:33 INFO - #02: mozilla::layers::LayerTransactionParent::RecvUpdateNoSwap(nsTArray<mozilla::layers::Edit> &&,unsigned __int64 const &,mozilla::layers::TargetConfig const &,nsTArray<mozilla::layers::PluginWindowData> &&,bool const &,bool const &,unsigned int const &,bool const &,mozilla::TimeStamp const &) [gfx/layers/ipc/LayerTransactionParent.cpp:206]
07:59:33 INFO - #03: mozilla::layers::PLayerTransactionParent::OnMessageReceived(IPC::Message const &) [obj-firefox/ipc/ipdl/PLayerTransactionParent.cpp:517]
07:59:33 INFO - #04: mozilla::layers::PCompositorParent::OnMessageReceived(IPC::Message const &) [obj-firefox/ipc/ipdl/PCompositorParent.cpp:483]
07:59:33 INFO - #05: mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const &) [ipc/glue/MessageChannel.cpp:1226]
07:59:33 INFO - #06: mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message const &) [ipc/glue/MessageChannel.cpp:1152]
07:59:33 INFO - #07: mozilla::ipc::MessageChannel::OnMaybeDequeueOne() [ipc/glue/MessageChannel.cpp:1136]
07:59:33 INFO - #08: MessageLoop::RunTask(Task *) [ipc/chromium/src/base/message_loop.cc:362]
07:59:33 INFO - #09: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const &) [ipc/chromium/src/base/message_loop.cc:372]
07:59:33 INFO - #10: MessageLoop::DoWork() [ipc/chromium/src/base/message_loop.cc:456]
07:59:33 INFO - #11: base::MessagePumpForUI::DoRunLoop() [ipc/chromium/src/base/message_pump_win.cc:217]
07:59:33 INFO - #12: base::MessagePumpWin::Run(base::MessagePump::Delegate *) [ipc/chromium/src/base/message_pump_win.h:78]
07:59:33 INFO - #13: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:233]
07:59:33 INFO - #14: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:227]
07:59:33 INFO - #15: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:201]
07:59:33 INFO - #16: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:173]
07:59:33 INFO - #17: `anonymous namespace'::ThreadFunc(void *) [ipc/chromium/src/base/platform_thread_win.cc:27]
07:59:33 INFO - #18: kernel32 + 0xb713
07:59:33 INFO - TEST-INFO | Main app process: exit status 1
07:59:33 INFO - 1523 ERROR TEST-UNEXPECTED-FAIL | dom/xul/test/test_bug583948.xul | application terminated with exit code 1
Component: Graphics → Graphics: Layers
See Also: → 1146692
See Also: 1146692
Marking as noted. Kats or I will investigate when we get a chance.
Whiteboard: [gfx-noted]
Attached patch LoggingSplinter Review
Interesting that this seems to be windows-only. I traced the code paths and it's not clear why this assertion would fail. I think the first step is to just get some more info - let's see if the transaction IDs are close to each other or just random junk. Logging patch attached to determine this.
Attachment #8585540 - Flags: review?(nical.bugzilla)
Comment on attachment 8585540 [details] [diff] [review]
Logging

Review of attachment 8585540 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/ipc/CompositorParent.cpp
@@ +1122,5 @@
> +#ifdef DEBUG
> +  if (aTransactionId <= mPendingTransaction) {
> +    // Logging added to help diagnose why we're triggering the assert below.
> +    // See bug 1145295
> +    printf_stderr("CRASH: aTransactionId %llu <= mPendingTransaction %llu\n",

nit: I think that it's more correct to write something like:

printf_stderr("... aTransactionId %" PRIo64 " <= ..."

using the constant that comes with #include <inttypes.h>

Actually, looks like long long unsigned isn't specified to be always 64 bit, but happens to be so in practice. Whatever.
Attachment #8585540 - Flags: review?(nical.bugzilla) → review+
(In reply to Nicolas Silva [:nical] from comment #18)
> ::: gfx/layers/ipc/CompositorParent.cpp
> @@ +1122,5 @@
> > +#ifdef DEBUG
> > +  if (aTransactionId <= mPendingTransaction) {
> > +    // Logging added to help diagnose why we're triggering the assert below.
> > +    // See bug 1145295
> > +    printf_stderr("CRASH: aTransactionId %llu <= mPendingTransaction %llu\n",
> 
> nit: I think that it's more correct to write something like:
> 
> printf_stderr("... aTransactionId %" PRIo64 " <= ..."
> 
> using the constant that comes with #include <inttypes.h>
> 
> Actually, looks like long long unsigned isn't specified to be always 64 bit,
> but happens to be so in practice. Whatever.

The reason for using the constants in <inttypes.h> isn't that "unsigned long long" might not be 64-bit, it's that these variables are of type "uint64_t", and "uint64_t" isn't "unsigned long long" on all platforms - on some platforms, such as 64-bit Linux, it's "unsigned long". The format checker (which we've enabled for printf_stderr in bug 965022) will complain if you try to format an "unsigned long" variable with a format flag ("%llu") that is meant for "unsigned long long", even if they're the same size on the current platform.
Yeah, llu fails to build on 64-bit linux with -Werror, I replaced it with PRIu64 and verified on try. Just waiting for inbound to reopen.
Don't you just love it when adding printfs makes bugs go away?
I backed out the debug log from comment 21 since this assertion failure hasn't showed up since I landed it. Either it's perma-fixed or it was fixed by the landing, I guess we'll find out!
Seems to be solidly gone.
Status: NEW → RESOLVED
Closed: 9 years ago
Keywords: leave-open
Resolution: --- → WORKSFORME
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Since this bug is back (although very intermittently, I'm relanding the logging patch from comment 24. I'll wait longer this time.
Bug 1184375 has more of these crashes. From comments 1-4 on that bug, the logging that I added reports:

CRASH: aTransactionId 1 <= mPendingTransaction 1203
CRASH: aTransactionId 1 <= mPendingTransaction 1215
CRASH: aTransactionId 1 <= mPendingTransaction 1209
CRASH: aTransactionId 1 <= mPendingTransaction 1182

So it looks like the transaction id counter is getting reset on the child side somehow.
Since the refresh driver (which is the implementation of the transaction id allocator) doesn't seem to reset its counter, only increment or decrement it, that leaves two possible options: either we're getting a fresh new ClientLayerManager hooked up to the same parent-side compositor, or the ClientLayerManager's id allocator is getting reset to a different refresh driver.

Matt, it looks you added this transaction ID machinery as part of bug 854421 - any ideas on what might be happening here? If not I can add some more logging and drill down.
Blocks: 854421
Flags: needinfo?(matt.woodrow)
The latter of those two options seems much more likely.

If we reload/change the top level document, then we'd still have the same widget (and associated layer manager/compositor), but we'd probably get a new refresh driver.
Flags: needinfo?(matt.woodrow)
If it's a legitimate scenario, should I just update the assert to this instead?

MOZ_ASSERT(aTransactionId == 1 || aTransactionId > mPendingTransaction)

test_bug583948.xul does in fact open a new window and trigger a reload on it 6 times, so it seems to fit with your theory. test_persistence.xul doesn't exactly do a reload, but it does open a window twice with the same URL so it might be triggering a similar mechanism.
Flags: needinfo?(matt.woodrow)
Yep, that sounds reasonable.
Flags: needinfo?(matt.woodrow)
Assignee: nobody → bugmail.mozilla
Attachment #8634750 - Flags: review?(matt.woodrow) → review+
Summary: Intermittent test_bug583948.xul | Assertion failure: aTransactionId > mPendingTransaction, at gfx\layers\ipc/CompositorParent.cpp:1124 → Intermittent test_persistence.xul,test_bug583948.xul | Assertion failure: aTransactionId > mPendingTransaction, at gfx\layers\ipc/CompositorParent.cpp:1124
https://hg.mozilla.org/mozilla-central/rev/3a1822dd5a48
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Please request Aurora/Beta/esr38 approval on this when you get a chance.
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
[Feature/regressing bug #]: code was added in bug 854421 but the assertion itself may have started firing at some later time due to other changes
[User impact if declined]: none
[Fix Landed on Version]: 42 m-c
[Describe test coverage new/current, TreeHerder]: the code is exercised by various tests (i.e. the ones that were intermittently failing)
[Risk to taking this patch (and alternatives if risky)]: no risk, this patch affects a MOZ_ASSERT which is debug-only. It's basically a fix for intermittent debug test failures.
[String or UUID changes made by this patch]: none
Attachment #8635340 - Flags: approval-mozilla-esr38?
Attachment #8635340 - Flags: approval-mozilla-beta?
Attachment #8635340 - Flags: approval-mozilla-aurora?
Comment on attachment 8635340 [details] [diff] [review]
Patch for aurora/beta/esr38

Simple fix. Beta+ Aurora+ ESR38+
Attachment #8635340 - Flags: approval-mozilla-esr38?
Attachment #8635340 - Flags: approval-mozilla-esr38+
Attachment #8635340 - Flags: approval-mozilla-beta?
Attachment #8635340 - Flags: approval-mozilla-beta+
Attachment #8635340 - Flags: approval-mozilla-aurora?
Attachment #8635340 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.