Closed
Bug 1371207
Opened 7 years ago
Closed 7 years ago
Intermittent test_crash.py TestCrash.test_crash_content_process | application crashed [@ js::ctypes::ConvertToJS]
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
firefox54 | --- | unaffected |
firefox55 | --- | affected |
firefox56 | --- | affected |
People
(Reporter: intermittent-bug-filer, Assigned: whimboo)
References
Details
(Keywords: crash, intermittent-failure, regression)
Crash Data
Filed by: hskupin [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=104881595&repo=try https://queue.taskcluster.net/v1/task/aKKpiSxUSTG5m8e1lYSlOg/runs/0/artifacts/public/logs/live_backing.log
Assignee | ||
Comment 1•7 years ago
|
||
As seen when running my try build [1] for bug 1223277 there is a recent regression in our content specific unit crash test [2] for Marionette. Here the first frames of the crashing thread 0: https://treeherder.mozilla.org/logviewer.html#?job_id=104881595&repo=try&lineNumber=2932 [task 2017-06-06T13:35:00.677708Z] 13:35:00 INFO - Crash reason: SIGSEGV [task 2017-06-06T13:35:00.678500Z] 13:35:00 INFO - Crash address: 0x8 [task 2017-06-06T13:35:00.679270Z] 13:35:00 INFO - Process uptime: not available [task 2017-06-06T13:35:00.680007Z] 13:35:00 INFO - [task 2017-06-06T13:35:00.681489Z] 13:35:00 INFO - 0 libxul.so!js::ctypes::ConvertToJS [CTypes.cpp:5b605b51a2f8 : 3203 + 0x0] [task 2017-06-06T13:35:00.682305Z] 13:35:00 INFO - eip = 0xf3a4a7f9 esp = 0xffcf1ff0 ebp = 0xffcf20f8 ebx = 0xf56305ec [task 2017-06-06T13:35:00.683051Z] 13:35:00 INFO - esi = 0x00000000 edi = 0xffcf2028 eax = 0x00000008 ecx = 0xffffff81 [task 2017-06-06T13:35:00.683715Z] 13:35:00 INFO - edx = 0xffcf2101 efl = 0x00210297 [task 2017-06-06T13:35:00.684484Z] 13:35:00 INFO - Found by: given as instruction pointer in context [task 2017-06-06T13:35:00.685233Z] 13:35:00 INFO - 1 libxul.so!js::ctypes::PointerType::ContentsGetter [CTypes.cpp:5b605b51a2f8 : 5342 + 0x22] [task 2017-06-06T13:35:00.685965Z] 13:35:00 INFO - eip = 0xf3a4fb68 esp = 0xffcf2100 ebp = 0xffcf2188 ebx = 0xf56305ec [task 2017-06-06T13:35:00.686829Z] 13:35:00 INFO - esi = 0x00000000 edi = 0xf716b000 [task 2017-06-06T13:35:00.687495Z] 13:35:00 INFO - Found by: call frame info [task 2017-06-06T13:35:00.688319Z] 13:35:00 INFO - 2 libxul.so!js::ctypes::Property<js::ctypes::PointerType::IsPointer, js::ctypes::PointerType::ContentsGetter>::Fun [CallNonGenericMethod.h:5b605b51a2f8 : 100 + 0x12] [task 2017-06-06T13:35:00.689048Z] 13:35:00 INFO - eip = 0xf3a4fe1c esp = 0xffcf2190 ebp = 0xffcf21d8 ebx = 0xf56305ec [task 2017-06-06T13:35:00.689793Z] 13:35:00 INFO - esi = 0xda361790 edi = 0xffcf21a8 [task 2017-06-06T13:35:00.690537Z] 13:35:00 INFO - Found by: call frame info [task 2017-06-06T13:35:00.691338Z] 13:35:00 INFO - 3 libxul.so!js::InternalCallOrConstruct [jscntxtinlines.h:5b605b51a2f8 : 293 + 0x10] [task 2017-06-06T13:35:00.692091Z] 13:35:00 INFO - eip = 0xf3aae987 esp = 0xffcf21e0 ebp = 0xffcf2278 ebx = 0xf56305ec [task 2017-06-06T13:35:00.692831Z] 13:35:00 INFO - esi = 0xffcf2240 edi = 0xf716b000 [task 2017-06-06T13:35:00.693564Z] 13:35:00 INFO - Found by: call frame info I did a regression test locally and it turns out the changes on bug 1365307 are causing this problem. Maybe the throttling of the HTTP transaction somewhat conflicts with the shutdown of Firefox as triggered by the content crash by the crash reporter (to be enabled with MOZ_CRASHREPORTER_SHUTDOWN env variable)? To run this test you can use the following command (maybe do an s/def test_/def tst_/ before to only run this single test): > mach marionette-test testing/marionette/harness/marionette_harness/tests/unit/test_crash.py [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=5b605b51a2f8dbad15b671a91ec49abad550a58c&selectedJob=104881595&bugfiler [2] https://dxr.mozilla.org/mozilla-central/rev/a49112c7a5765802096b3fc298069b9495436107/testing/marionette/harness/marionette_harness/tests/unit/test_crash.py#108
Blocks: 1223277
status-firefox54:
--- → unaffected
status-firefox55:
--- → affected
Component: Marionette → JavaScript Engine
Flags: needinfo?(mcmanus)
Flags: needinfo?(honzab.moz)
Keywords: regression
OS: Unspecified → All
Product: Testing → Core
Hardware: Unspecified → All
Version: Version 3 → 55 Branch
Assignee | ||
Comment 3•7 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #1) > As seen when running my try build [1] for bug 1223277 there is a recent > regression in our content specific unit crash test [2] for Marionette. Here Just to add, I do not have to attach my patch from bug 1223277 to reproduce it locally. I don't know why we do not see this failure on Treeherder, but maybe this is also related to artifact builds I locally do? The frequency of test failures increased a lot over the last days. Means with bug 1365307 landed it happened 1 out of 10 repeats (--repeat 10). But with latest mozilla-central it is close to fail each time I run this test.
Comment 4•7 years ago
|
||
Henrik, can you please recheck if this is still a problem after bug 1369496 has landed? If yes, I'll start looking into it early next week.
Flags: needinfo?(hskupin)
Assignee | ||
Comment 5•7 years ago
|
||
Yes, it's still happening with an artifact build created from changeset https://hg.mozilla.org/mozilla-central/rev/1742b1bdadd1. Maybe it's also somewhat related to the following assertions? Assertion failed at /home/worker/workspace/build/src/gfx/cairo/cairo/src/cairo-hash.c:196: hash_table->live_entries == 0 WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME. FIX THIS! [Parent 56485] ###!!! ASSERTION: Component Manager being held past XPCOM shutdown.: 'cnt == 0', file /home/worker/workspace/build/src/xpcom/build/XPCOMInit.cpp, line 1042 [Parent 56485] WARNING: '!compMgr', file /home/worker/workspace/build/src/xpcom/components/nsComponentManagerUtils.cpp, line 63 Leaked URLs: There are a huge amount of URLs which leak. And then I see: Assertion failure: isEmpty() (failing this assertion means this LinkedList's creator is buggy: it should have removed all this list's elements before the list's destruction), at /home/worker/workspace/build/src/obj-firefox/dist/include/mozilla/LinkedList.h:455 #01: __cxa_finalize_ranges[/usr/lib/system/libsystem_c.dylib +0x60178] #02: exit[/usr/lib/system/libsystem_c.dylib +0x604b2]
Flags: needinfo?(hskupin)
Comment 6•7 years ago
|
||
Thanks. I'll take this bug, but as I said, I won't be able to start with it sooner than early next week. Hope it's OK.
Assignee: nobody → honzab.moz
Flags: needinfo?(honzab.moz)
Assignee | ||
Comment 7•7 years ago
|
||
Sounds great. Thank you in advance.
Comment 8•7 years ago
|
||
I can't run the test locally: "Timed out waiting for connection on localhost:2828" To be honest, I don't believe I will be of much help here. The code added in bug 1365307 doesn't touch any JS or linked list code, even distantly. If the crash has already existed before, then this is just and only a timing issue or the code restructure by optimization so that the crash just occurs more often. You should contact people responsible for the code that crashes, not someone who wrote a patch that happens to make it worse by clearly unrelated changes. I'll try to clone and build the try changeset locally to get exactly the same build yet. If I still fail to run the test or fail to reproduce the crash I'll return the bug back to the ring. In the mean time, can you do a try push or a local test that will turn "network.http.throttle.enable" to false? That disables some parts of the functionality introduced in bug 1365307, just check if there is any effect. Thanks.
Comment 9•7 years ago
|
||
@ /c/Mozilla/src/try-5b605b51a2f8dbad15b671a91ec49abad550a58c $ ./mach marionette-test testing/marionette/harness/marionette_harness/tests/unit/test_crash.py 0:00.00 LOG: MainThread INFO Using workspace for temporary data: "c:\Mozilla\src\try-5b605b51a2f8dbad15b671a91ec49abad550a58c" 0:00.00 LOG: MainThread mozversion INFO application_buildid: 20170613140333 0:00.00 LOG: MainThread mozversion INFO application_display_name: Nightly 0:00.00 LOG: MainThread mozversion INFO application_id: {ec8030f7-c20a-464f-9b0e-13a3a9e97384} 0:00.00 LOG: MainThread mozversion INFO application_name: Firefox 0:00.00 LOG: MainThread mozversion INFO application_remotingname: firefox 0:00.00 LOG: MainThread mozversion INFO application_vendor: Mozilla 0:00.00 LOG: MainThread mozversion INFO application_version: 55.0a1 0:00.00 LOG: MainThread mozversion INFO platform_buildid: 20170613140333 0:00.00 LOG: MainThread mozversion INFO platform_version: 55.0a1 0:00.01 LOG: MainThread INFO Application command: c:/_/obj/try-5b605b51a2f8dbad15b671a91ec49abad550a58c/_obj-browser-debug\dist\bin\firefox.exe -no-remote -marionette -profile c:\users\xmayhe~1\appdata\local\temp\tmpri1vph.mozrunner Error running mach: ['marionette-test', 'testing/marionette/harness/marionette_harness/tests/unit/test_crash.py'] The error occurred in code that was called by the mach command. This is either a bug in the called code itself or in the way that mach is calling it. You should consider filing a bug for this issue. If filing a bug, please include the full output of mach, including this error message. The details of the failure are as follows: IOError: Process has been unexpectedly closed (Exit code: 2147483651) (Reason: Timed out waiting for connection on localhost:2828!) File "c:\Mozilla\src\try-5b605b51a2f8dbad15b671a91ec49abad550a58c\testing/marionette/mach_commands.py", line 81, in run_marionette_test return run_marionette(tests, topsrcdir=self.topsrcdir, **kwargs) File "c:\Mozilla\src\try-5b605b51a2f8dbad15b671a91ec49abad550a58c\testing/marionette/mach_commands.py", line 59, in run_marionette failed = MarionetteHarness(MarionetteTestRunner, args=vars(args)).run() File "c:\Mozilla\src\try-5b605b51a2f8dbad15b671a91ec49abad550a58c\testing/marionette/harness\marionette_harness\runtests.py", line 72, in run runner.run_tests(tests) File "c:\Mozilla\src\try-5b605b51a2f8dbad15b671a91ec49abad550a58c\testing/marionette/harness\marionette_harness\runner\base.py", line 827, in run_tests self.marionette = self.driverclass(**self._build_kwargs()) File "c:\Mozilla\src\try-5b605b51a2f8dbad15b671a91ec49abad550a58c\testing/marionette/client\marionette_driver\marionette.py", line 611, in __init__ self.raise_for_port(timeout=self.startup_timeout) File "c:\Mozilla\src\try-5b605b51a2f8dbad15b671a91ec49abad550a58c\testing/marionette/client\marionette_driver\decorators.py", line 28, in _ m._handle_socket_failure() File "c:\Mozilla\src\try-5b605b51a2f8dbad15b671a91ec49abad550a58c\testing/marionette/client\marionette_driver\decorators.py", line 23, in _ return func(*args, **kwargs) File "c:\Mozilla\src\try-5b605b51a2f8dbad15b671a91ec49abad550a58c\testing/marionette/client\marionette_driver\marionette.py", line 697, in raise_for_port self.host, self.port)) Giving up.
Assignee: honzab.moz → nobody
Comment 10•7 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #5) > Assertion failure: isEmpty() (failing this assertion means this LinkedList's > creator is buggy: it should have removed all this list's elements before the > list's destruction), at > /home/worker/workspace/build/src/obj-firefox/dist/include/mozilla/LinkedList. > h:455 > #01: __cxa_finalize_ranges[/usr/lib/system/libsystem_c.dylib +0x60178] > #02: exit[/usr/lib/system/libsystem_c.dylib +0x604b2] I from time to time am getting this assertion failure too at shutdown when running my local build as a regular browser. I managed to reveal the element, it's nsSHistory pointing to doc shell with chrome://global/content/backgroundPageThumbs.xhtml as current URI. The crashing process is the parent process. I had the net logging on and all transactions were according it released. So I don't see a connection with bug 1365307.
Updated•7 years ago
|
status-firefox56:
--- → affected
Assignee | ||
Comment 11•7 years ago
|
||
Honza, thank you for digging into the problem. When I regression tested the crash behavior it indeed started with your push. But maybe it's a side-effect including something else. Especially because the failure rate raised even more afterward, so a combination might be possible. I will do some more investigation myself in the next days.
Assignee | ||
Comment 12•7 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #10) > > Assertion failure: isEmpty() (failing this assertion means this LinkedList's > > creator is buggy: it should have removed all this list's elements before the > > list's destruction), at > > /home/worker/workspace/build/src/obj-firefox/dist/include/mozilla/LinkedList. > > h:455 > > #01: __cxa_finalize_ranges[/usr/lib/system/libsystem_c.dylib +0x60178] > > #02: exit[/usr/lib/system/libsystem_c.dylib +0x604b2] So this is definitely a different crash and may overlay the underlying problem I was trying to investigate on this bug. I will file a separate bug for this specific crash, and will come back to this one once fixed.
Assignee | ||
Comment 13•7 years ago
|
||
I was able to reproduce this crash by a try build for another patch I'm working on: https://treeherder.mozilla.org/#/jobs?repo=try&revision=de0f62d300600e1e6586f72bf516a217499efd9a It reproduces constantly, so I will try if I can use this method to track down the regression range.
Assignee | ||
Comment 14•7 years ago
|
||
So this may actually only be a side-effect of the crash we force in that test. Because we do not mock the mozcrash module long enough, the original module picks up the minidump files and processes it. This should actually not happen, but is due to delayed minidump file creation. Maybe the delay got introduced by the changes on bug 1369496. Anyway, on bug 1223277 I added code which extends the lifetime of the mozcrash mock, so that the mock fetches it correctly, and that this crash is no longer visible in the Treeherder logs. This should be fine enough.
Assignee | ||
Comment 15•7 years ago
|
||
It might be fixed by the changes of bug 1223277, but I'm not fully sure. So marking as WFM for now.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•