Closed Bug 1024272 Opened 10 years ago Closed 10 years ago

Win64 TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\xpcshell\tests\toolkit\crashreporter\test\unit\test_crash_AsyncShutdown.js | test failed (with xpcshell return code: 0),

Categories

(Toolkit :: General, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla35

People

(Reporter: away, Unassigned)

References

Details

https://tbpl.mozilla.org/php/getParsedLog.php?id=41403121&tree=Date

Not sure what the actual error is.
It's this line:

19:14:53     INFO -  ERROR: AsyncShutdown timeout in profile-before-change Conditions: [{"name":"OS.File: flush I/O queued before profile-before-change","state":{"launched":true,"shutdown":false,"worker":true,"pendingReset":false,"latestSent":["Mon Jun 09 2014 19:14:53 GMT-0700 (Pacific Standard Time)","getCurrentDirectory"],"latestReceived":["Mon Jun 09 2014 19:14:53 GMT-0700 (Pacific Standard Time)",{"ok":{"string":"C:\\\\slave\\\\test\\\\build\\\\tests\\\\xpcshell\\\\tests\\\\toolkit\\\\crashreporter\\\\test\\\\unit"},"id":1,"durationMs":null,"timeStamps":{"entered":1402366491220,"loaded":1402366491236}}],"messagesSent":0,"messagesReceived":1,"messagesQueued":1,"DEBUG":false},"filename":"resource://gre/modules/osfile/osfile_async_front.jsm","lineNumber":1515}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.
Oh wait, I'm wrong. This test is checking to make sure that the crash does happen properly. So:

19:14:53  WARNING -  TEST-UNEXPECTED-FAIL | C:/slave/test/build/tests/xpcshell/tests/toolkit/crashreporter/test/unit/head_crashreporter.js | No minidump found! - See following stack:

I expect this is something to do with crash reporting in win64.
Blocks: 1033110
test_crash_AsyncShutdown.js:

function run_test() {
  do_crash(setup_crash, after_crash);
  do_crash(setup_osfile_crash_noerror, after_osfile_crash_noerror);
  do_crash(setup_osfile_crash_exn, after_osfile_crash_exn);
}

The first do_crash test is fine. The second and third fail. The NS_DebugBreak never hits Breakpad's ExceptionHandler::HandleException. Instead I just get my default postmortem debugger.
It passes with --disable-ion. I think this is something to do with JIT frames and stack walking.

I can see ntdll!RtlDispatchException calling ntdll!RtlLookupFunctionEntry in a loop, looking up each frame from the stack that produced the "int 3". After the frame for mozjs!js::jit::DoCallFallback, there is a RWX address that I assume is JIT code, and after that, the addresses go off into nowhere. The walk never finds its way back to "regular" code. I guess that stops it from finding the UnhandledExceptionFilter?
Oh, that's bad! Maybe our x86-64 JIT doesn't set up the ABI stackwalk properly?
Flags: needinfo?(jdemooij)
http://msdn.microsoft.com/en-us/library/ft9x1kdx.aspx:

> For dynamically generated functions [JIT compilers], the runtime to support these functions must 
> either use RtlInstallFunctionTableCallback or RtlAddFunctionTable to provide this information to 
> the operating system. Failure to do so will result in unreliable exception handling and debugging
> of processes.

I don't see either of those in DXR.
(In reply to David Major [:dmajor] from comment #4)
> It passes with --disable-ion. I think this is something to do with JIT
> frames and stack walking.

That's a configure flag, right? This will also disable Baseline. Judging from DoCallFallback you mentioned this is probably Baseline code.

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> Oh, that's bad! Maybe our x86-64 JIT doesn't set up the ABI stackwalk
> properly?

Unfortunately stack walking for JIT code is unreliable, because Ion code can allocate the frame pointer register like any other register... Baseline code should maintain the frame pointer, but I'm not sure this works 100% of the time. There are also some differences between x64 and Win64, I'll take a look at that.

Do we know what mechanism they use?
"they" in this case is the official x86-64 ABI for Windows and the link dmajor provided.
Win64 requires all dynamically-generated code register unwind info with the runtime for SEH to work and we totally don't do that (it'd be a significant undertaking).  This is only a problem because breakpad uses the unhandled exception filter which happens after SEH (and never gets called if SEH fails).  Fortunately, we can simply switch to use a "vectored" exception handler, which runs before SEH and doesn't depend on any stack-walking.  I filed bug 844196 on this a while ago.  An important corollary, though, is that we can't use SEH anywhere (well, on any threads that can interleave JIT code) in FF.
Flags: needinfo?(jdemooij)
Depends on: 844196
Blocks: 880004
Blocks: 886640
Is this working post-bug 844196?
Yep! Was just waiting for a Date run. https://treeherder.mozilla.org/ui/#/jobs?repo=date&revision=335bc10c5ecb
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.