Closed
Bug 539530
Opened 16 years ago
Closed 15 years ago
Test test_fpuhandler.html crashes on try and electrolysis
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: cmtalbert, Assigned: cmtalbert)
References
Details
(Whiteboard: [unittest][try-server])
There is something unusual going on with Mozilla Try. The test toolkit/xre/test/test_fpuhandler.html seems to crash there. It does not crash on mozilla-central, but every time it is run on try since being checked in, it has crashed. Here are some build logs:
* http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1263400506.1263409600.838.gz#err3
* http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1263399447.1263409304.30011.gz#err0
Running on my own box, I can't reproduce this crash. Right now, jmaher and I are testing a patch to try that reverts this test to see if try goes green. Will update here with the status.
Comment 1•16 years ago
|
||
It's crashing or hanging on electrolysis also. I really don't want to disable the test, since it's a major topcrasher and easy to regress. I'd be really happy if somebody caught it in a debugger. It's possible that this is somehow related to invididual build config settings. Possibilities include:
* enable or disablement of crash reporter at build time
* enable or disablement of crash reporter in the testsuite
* configuration of the JIT debugger on the build slave
Comment 2•16 years ago
|
||
running without the specific test yields a full pass on mochitest-plain. I have verified in my own build environment (using the .mozconfig for win32 from the tryserver) that this test passes. It could be that I am building with a different version of windows or the sdk or as bsmedberg mentiones above something related to the JIT debugger on the device.
Comment 3•16 years ago
|
||
There should be no difference between electrolysis and mozilla-central for crash reporter disabling at test time. It's possible try has drifted out of sync but I don't recall any changes to that recently. The JIT debugger is enabled on all try and non-try our slaves still, bug 533185 to fix that.
Comment 4•16 years ago
|
||
We set MINIDUMP_STACKWALK=/path/to/stackwalker and MOZ_CRASHREPORTER_NO_REPORT=1 on both try and non-try slaves when running mochitest, so everything else is up to the test suite. We also set MOZ_AIRBAG=1 but that probably doesn't do anything anymore.
There is no crashreporter ac_options in either mozconfig:
http://hg.mozilla.org/build/buildbot-configs/file/default/tryserver/mozconfig-win32-unittest
http://hg.mozilla.org/build/buildbot-configs/file/default/mozilla2/win32/mozilla-central/nightly/mozconfig
Note that try runs mochitest-plain as one big chunk, as does electrolysis, while mozilla-central breaks it into 5 chunks. We also use the opt builds for m-c, while it's opt+refcount for the other two.
Comment 5•16 years ago
|
||
Also note that Try doesn't do PGO builds
Updated•16 years ago
|
Whiteboard: [unittest][try-server]
Comment 6•16 years ago
|
||
I dunno what we can do about this bug.
Summary: Test test_fpuhandler.html crashes only on Mozilla Try → Test test_fpuhandler.html crashes on try and electrolysis
Comment 7•16 years ago
|
||
(In reply to comment #6)
> I dunno what we can do about this bug.
Whoops, didn't mean to post that comment!
Clint, if you borrowed a slave for awhile in an attempt to reproduce/debug/solve the issue, would that be helpful?
Comment 8•16 years ago
|
||
Per IRC, Clint is going to look into this. Currently blocked on getting a copy of the ref VM.
Assignee: nobody → ctalbert
Ben, thanks so much for the VM. So with the VM and with a little help in seeing the obvious from John O'Duinn I understand a little bit of what's going on here.
I have managed to recreate the failure. It is a busy hang, the build goes unresponsive. I'm building debug now to see if I can recreate the issue.
The reason we don't see this on mozilla-central but we do see it on e10s and try is because we split the mochitests into chunks on mozilla-central. This bug will only recur if you actually run the entire mochitest suite from start to finish. Running it with just fpuhandler will work, running the entire toolkit directory will work. Running the whole suite will fail.
So, this is really interesting and useful in that regard as it could help pinpoint why we see these sorts of behaviors in the mochitests. I want to keep trying to debug to see if we can determine what is going on here. Unfortunately, that means I may have to keep the VM for a little while as each run here is going to take a while. I'm going to try to build it and see if I can catch this in a record/replay vm since reproducing it might be related more to the running of the entire test suite and less related to the actual hardware/build of the try server VM. I'll let you know what I find on Monday.
Ben, when do you need this vm back?
Comment 10•16 years ago
|
||
Glad to hear you're making progress.
> Ben, when do you need this vm back?
No rush, keep it as long as you need it.
Comment 11•16 years ago
|
||
ok, that's a good clue. I wonder if it's possible that the breakpad tests (which unregister and re-register the breakpad exception handler) are somehow causing the other test to fail...
Comment 12•16 years ago
|
||
There aren't any Breakpad mochitests. (There are some browser-chrome tests, and some xpcshell tests, none of them unset the handler currently.)
| Assignee | ||
Comment 13•16 years ago
|
||
Ok, this gets weirder.
I've managed to cause this anytime a debugger is attached. If you have a debugger attached and you've run with --test-path=toolkit/xre/test so that you will only execute this test, then it will happen.
I've replicated this both on the tinderbox windows 2003 server system with MSVC 8 as the debugger and with MSVC 9 on a windows 7 32 bit system. In both cases, calling the _control87 call causes a 0xC0000090 (EXCEPTION_FLT_INVALID_OPERATION) which is a catch-all exception for floating point manipulation.
I'm running another full mochitest run on the tinderbox VM to verify that the exception thrown during a normal run is the same as what I'm seeing from the debugger and not some kind of debugging anomaly, because recall that it never occurred previously on the windows 7 32 bit vm.
More results as I have them, but any advice you have is greatly appreciated, this is turning into quite a head-scratcher.
Comment 14•16 years ago
|
||
Cause what? A floating-point exception? That is to be expected: last-chance exception handlers (such as breakpad and the FPU handler being tested here) only run when a debugger is not attached.
However, you can attach a debugger to the exception handler *after* you've triggered it (I do this by inserting a MessageBox in the handler itself).
Comment 15•15 years ago
|
||
ctalbert: is this still happening since the recent rework of TryServer-as-a-branch?
Comment 16•15 years ago
|
||
I don't see this test failing in the series of runs I have done.
| Assignee | ||
Comment 17•15 years ago
|
||
me either --> WFM
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WORKSFORME
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•