Intermittent PROCESS-CRASH | test_child_mozinfo.js | application crashed [@ MessageLoop::DeletePendingTasks()]

RESOLVED DUPLICATE of bug 1355196

Status

RESOLVED DUPLICATE of bug 1355196
2 years ago
11 months ago

People

(Reporter: aryx, Unassigned)

Tracking

({intermittent-failure})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [stockwell unknown])

https://treeherder.mozilla.org/logviewer.html#?job_id=76634347&repo=mozilla-inbound

04:38:56     INFO - TEST-PASS | z:\build\build\src\testing\xpcshell\selftest.py | XPCShellTestsTests.testChild
04:38:56     INFO - TEST-PASS | z:\build\build\src\testing\xpcshell\selftest.py | XPCShellTestsTests.testChildFail
04:38:56     INFO - TEST-PASS | z:\build\build\src\testing\xpcshell\selftest.py | XPCShellTestsTests.testChildHang
04:38:56  WARNING - TEST-UNEXPECTED-FAIL | z:\build\build\src\testing\xpcshell\selftest.py | XPCShellTestsTests.testChildMozinfo, line 1337: Tests should have passed, log:
04:38:56     INFO - FAIL: testChildMozinfo (__main__.XPCShellTestsTests)
04:38:56     INFO - Traceback (most recent call last):
04:38:56     INFO -   File "z:\build\build\src\testing\xpcshell\selftest.py", line 1337, in testChildMozinfo
04:38:56     INFO -     self.assertTestResult(True)
04:38:56     INFO -   File "z:\build\build\src\testing\xpcshell\selftest.py", line 490, in assertTestResult
04:38:56     INFO -     """ % ("passed" if expected else "failed", self.log.getvalue()))
04:38:56     INFO - AssertionError: Tests should have passed, log:
04:38:56     INFO - ========
04:38:56     INFO - MOZ_NODE_PATH environment variable not set. Tests requiring http/2 will fail.
04:38:56     INFO - Running tests sequentially.
04:38:56     INFO - SUITE-START | Running 1 tests
04:38:56     INFO - TEST-START | test_child_mozinfo.js
04:38:56  WARNING - TEST-UNEXPECTED-FAIL | test_child_mozinfo.js | xpcshell return code: 1
04:38:56     INFO - TEST-INFO took 2012ms

04:38:54     INFO - TEST-PASS | z:\build\build\src\build\compare-mozconfig\compare-mozconfigs-wrapper.py | TestCompareMozconfigs.test_compare_mozconfigs
04:38:54     INFO - z:\build\build\src\config\tests\unit-mozunit.py
04:38:54     INFO - TEST-PASS | z:\build\build\src\config\tests\unit-mozunit.py | TestMozUnit.test_mocked_open
04:38:56     INFO - z:\build\build\src\testing\xpcshell\selftest.py
04:38:56     INFO - PROCESS-CRASH | test_child_mozinfo.js | application crashed [@ MessageLoop::DeletePendingTasks()]
04:38:56     INFO - Crash dump filename: c:\users\task_1486869950\appdata\local\temp\xpc-other-u9ssch\f2788641-fa11-431e-b495-a3073fec775d.dmp
04:38:56     INFO - Operating system: Windows NT
04:38:56     INFO -                   6.3.9600
04:38:56     INFO - CPU: x86
04:38:56     INFO -      GenuineIntel family 6 model 63 stepping 2
04:38:56     INFO -      16 CPUs
04:38:56     INFO - Crash reason:  EXCEPTION_BREAKPOINT
04:38:56     INFO - Crash address: 0x6ce0d4fa
04:38:56     INFO - Thread 3 (crashed)
04:38:56     INFO -  0  xul.dll!MessageLoop::DeletePendingTasks() [message_loop.cc:549115eef1f6 : 407 + 0x18]
04:38:56     INFO -     eip = 0x6ce0d4fa   esp = 0x072dfbac   ebp = 0x072dfbd4   ebx = 0x072dfc44
04:38:56     INFO -     esi = 0x00000197   edi = 0x00000000   eax = 0x00000000   ecx = 0x716606ef
04:38:56     INFO -     edx = 0xfedaf000   efl = 0x00000206
04:38:56     INFO -     Found by: given as instruction pointer in context
04:38:56     INFO -  1  xul.dll!MessageLoop::~MessageLoop() [message_loop.cc:549115eef1f6 : 189 + 0x6]
04:38:56     INFO -     eip = 0x6ce0b4ce   esp = 0x072dfbb8   ebp = 0x072dfbd4
04:38:56     INFO -     Found by: call frame info
04:38:56     INFO -  2  xul.dll!base::Thread::ThreadMain() [thread.cc:549115eef1f6 : 196 + 0x10]
04:38:56     INFO -     eip = 0x6ce12ad6   esp = 0x072dfbdc   ebp = 0x072dfca8
04:38:56     INFO -     Found by: call frame info
04:38:56     INFO -  3  xul.dll!`anonymous namespace'::ThreadFunc [platform_thread_win.cc:549115eef1f6 : 28 + 0x7]
04:38:56     INFO -     eip = 0x6ce02c69   esp = 0x072dfcb0   ebp = 0x072dfcb0
04:38:56     INFO -     Found by: call frame info
04:38:56     INFO -  4  kernel32.dll + 0x17c03
04:38:56     INFO -     eip = 0x771c7c04   esp = 0x072dfcb8   ebp = 0x072dfcc4
04:38:56     INFO -     Found by: call frame info
04:38:56     INFO -  5  ntdll.dll + 0x5ab8e
04:38:56     INFO -     eip = 0x7757ab8f   esp = 0x072dfccc   ebp = 0x072dfd0c
04:38:56     INFO -     Found by: previous frame's frame pointer
04:38:56     INFO -  6  ntdll.dll + 0x5ab59
04:38:56     INFO -     eip = 0x7757ab5a   esp = 0x072dfd14   ebp = 0x072dfd1c
04:38:56     INFO -     Found by: previous frame's frame pointer
6 failures in 867 pushes (0.007 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 4
* graphics: 1
* autoland: 1

Platform breakdown:
* windows2012-32: 6

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-04-03&endday=2017-04-09&tree=all
5 failures in 894 pushes (0.006 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 3
* graphics: 1
* autoland: 1

Platform breakdown:
* windows2012-32: 4
* windows2012-32-qr: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-04-10&endday=2017-04-16&tree=all
7 failures in 817 pushes (0.009 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 7

Platform breakdown:
* windows2012-32: 7

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-04-17&endday=2017-04-23&tree=all
12 failures in 770 pushes (0.016 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 6
* try: 3
* mozilla-inbound: 2
* mozilla-central: 1

Platform breakdown:
* windows2012-32: 12

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-05-01&endday=2017-05-07&tree=all
14 failures in 777 pushes (0.018 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 10
* mozilla-inbound: 3
* mozilla-central: 1

Platform breakdown:
* windows2012-32: 12
* windows2012-64: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-05-15&endday=2017-05-21&tree=all
See Also: → bug 1339426
31 failures in 891 pushes (0.035 failures/push) were associated with this bug in the last 7 days. 

This is the #47 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* autoland: 15
* mozilla-inbound: 9
* mozilla-central: 6
* try: 1

Platform breakdown:
* windows2012-32: 17
* windows2012-64: 14

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-05-22&endday=2017-05-28&tree=all
this is on our radar to get fixed, what is odd is this appears to only be failing on taskcluster builds (a different OS image and runtime environment).

:grenade, do you have any thoughts on how to track this down?  Maybe there are specific machine instances that have dirty filesystems?
Flags: needinfo?(rthijssen)
Whiteboard: [stockwell needswork]
on the builders, we're unlikely to suffer from dirty filesystems because we create a new task user for each build which gets:
- Z:\task_xxxxx: a new working directory
- C:\Users\task_xxxxx: a new home directory
- C:\Users\task_xxxxx\AppData: new application and temporary directories

buildbot builds don't get this luxury and share AppData and Temp data under C:\Users\cltbld\AppData and are more likely to suffer from dirty filesystems.

we've seen some problems in the past from test jobs not using correct temp or appdata, so one thing to look out for in the logs is the string "C:\Users\GenericWorker\AppData" (or variations with backslashes or double forward slashes). if that exists, there's a problem (tasks are not using the correct folders to store temp or appdata and using the parent process environment instead).

apart from that, or if we're not seeing that in fail logs, then i think we can discount dirty filesystems as we shouldn't have those on tc windows builders.

testers are a different story as they do share dirty environments and run as GenericWorker, but the failures i see linked in this bug all happened on 2012 builders (even though, bizarrely they are running xpcshell tests, no idea what that's about).
Flags: needinfo?(rthijssen)
Thanks Rob for the info.  I had overlooked these were selftests on the builders.  It looks to be a debug only issue and always failing on the same child process test.  Both this bug and bug 1339426 have debug only failures in the build selftests on child process tests.

here is an example log:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=102177096&lineNumber=22260

:jimm, as this looks to be more related to child process tests, could you help find someone to look at this or help us uncover what is going on?
Flags: needinfo?(jmathies)
See Also: → bug 1359596
Current crashes here and in see-also bugs all seem to be caused by a failed assertion at:

https://hg.mozilla.org/mozilla-central/annotate/bdb2387396b4a74dfefb7c983733eed3625e906a/ipc/chromium/src/base/message_loop.cc#l410

   bool MessageLoop::DeletePendingTasks() {
       MOZ_ASSERT(work_queue_.empty());

That assertion was added in bug 976363, years ago.
(In reply to Joel Maher ( :jmaher) from comment #9)
> Thanks Rob for the info.  I had overlooked these were selftests on the
> builders.  It looks to be a debug only issue and always failing on the same
> child process test.  Both this bug and bug 1339426 have debug only failures
> in the build selftests on child process tests.
> 
> here is an example log:
> https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-
> central&job_id=102177096&lineNumber=22260
> 
> :jimm, as this looks to be more related to child process tests, could you
> help find someone to look at this or help us uncover what is going on?

Isn't this just a failed test run?

WARNING - TEST-UNEXPECTED-FAIL | z:\build\build\src\testing\xpcshell\selftest.py | XPCShellTestsTests.testChildMozinfo, line 1326: Tests should have passed
Flags: needinfo?(jmathies)
this is an intermittent failure that runs during a build in the selftest.py for xpcshell- specifically in the child process tests.
27 failures in 820 pushes (0.033 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 20
* mozilla-inbound: 6
* try: 1

Platform breakdown:
* windows2012-32: 18
* windows2012-64: 9

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-05-29&endday=2017-06-04&tree=all
:jimm- between this bug, bug 1359596, and bug 1339426, we are near 100 failures/week- do you have ideas on who we could talk to in order to get this fixed?  Maybe someone who knows child process behavior?
Flags: needinfo?(jmathies)
26 failures in 864 pushes (0.03 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 13
* mozilla-inbound: 7
* try: 3
* mozilla-central: 3

Platform breakdown:
* windows2012-32: 22
* windows2012-64: 4

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-06-05&endday=2017-06-11&tree=all
:jimm, ping- I would like to get this resolved next week, if you don't have time I am happy to disable a few tests until you have more time.
16 failures in 814 pushes (0.02 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 9
* mozilla-inbound: 6
* mozilla-beta: 1

Platform breakdown:
* windows2012-32: 15
* windows2012-32-noopt: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-06-12&endday=2017-06-18&tree=all
(In reply to Joel Maher ( :jmaher) from comment #16)
> :jimm, ping- I would like to get this resolved next week, if you don't have
> time I am happy to disable a few tests until you have more time.

sorry for the delay, looking at this now.
Looks like this stopped on May 22nd. I might be missing something though. 

https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-05-22&endday=2017-05-28&tree=all

I was looking at logged stacks here and couldn't figure out why the line numbers didn't match.
Flags: needinfo?(jmaher)
we still get failures, 26 in the last week.  Remembe. there are 2 other bugs as well bug 1359596, and bug 1339426.

I don't understand the logged stacks and line numbers statement- is it possible the failure has changed since May 22nd?
Flags: needinfo?(jmaher)
26 failures in 892 pushes (0.029 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 13
* autoland: 11
* mozilla-central: 1
* mozilla-beta: 1

Platform breakdown:
* windows2012-32: 25
* windows2012-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-06-19&endday=2017-06-25&tree=all
Hey Joel, is there some way to trigger upload of these crash dumps as artifacts?

Crash dump filename: c:\users\task_1497897391\appdata\local\temp\xpc-other-ofxrvy\e030af38-b032-45ef-8030-e27922c2fa3f.dmp
Flags: needinfo?(jmaher)

Updated

a year ago
Flags: needinfo?(jmathies)
I was looking into this, unfortunately I am going out on holiday for 1.5 weeks, so I won't be able to verify this on try.

This failure is a job running in the build and we should ensure we have the right environment variables to generate a .dmp file and put it in the MOZILLA_UPLOAD_DIR (or is it MOZ_UPLOAD_DIR) which will then upload it.  Here is where the env vars for the job are defined:
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/builds/taskcluster_firefox_windows_64_debug.py?q=path%3Ataskcluster_firefox_windows_64_debug.py&redirect_type=single#56

:gbrown, would it be possibly for you to help :jimm out as needed on this bug?  Possibly a push or two to try and retrigger until this selftest failure shows up so it would be possible to help :jimm out or lead him in the right direction.
Flags: needinfo?(jmaher) → needinfo?(gbrown)
22 failures in 718 pushes (0.031 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 14
* mozilla-inbound: 5
* mozilla-central: 2
* try: 1

Platform breakdown:
* windows2012-32: 22

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-06-26&endday=2017-07-02&tree=all
Happy to help, if I can. Test tasks put artifacts in MOZ_UPLOAD_DIR, but I don't know if that will work for builds. I'll see what I can figure out...
I think the MINIDUMP_SAVE_PATH should have been changed when we migrated to Taskcluster. I'm trying to correct that, but having some misadventures on try...hope to have this worked out tomorrow.
Depends on: 1378251
Produced .dmp and .extra artifacts in the crash in https://treeherder.mozilla.org/#/jobs?repo=try&revision=fd4f921d7510ab3b26c928e9947fab2d46ebf4e6. 

Will fix in bug 1378251.
Flags: needinfo?(gbrown)
16 failures in 656 pushes (0.024 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 8
* mozilla-inbound: 6
* try: 1
* mozilla-beta: 1

Platform breakdown:
* windows2012-32: 16

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-07-03&endday=2017-07-09&tree=all
Whiteboard: [stockwell needswork] → [stockwell unknown]
12 failures in 720 pushes (0.017 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 5
* mozilla-inbound: 4
* try: 2
* mozilla-beta: 1

Platform breakdown:
* windows2012-32: 11
* windows2012-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-07-10&endday=2017-07-16&tree=all
Status: NEW → RESOLVED
Last Resolved: a year ago
Flags: needinfo?(jmathies)
Resolution: --- → DUPLICATE
Duplicate of bug: 1355196
5 failures in 822 pushes (0.006 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-beta: 2
* mozilla-inbound: 1
* mozilla-central: 1
* autoland: 1

Platform breakdown:
* windows2012-64: 3
* windows2012-32: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-07-17&endday=2017-07-23&tree=all
1 failures in 1008 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-beta: 1

Platform breakdown:
* windows2012-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-07-24&endday=2017-07-30&tree=all
1 failures in 1032 pushes (0.001 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* try: 1

Platform breakdown:
* windows2012-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1338878&startday=2017-09-11&endday=2017-09-17&tree=all
You need to log in before you can comment on or make changes to this bug.