Closed Bug 906793 Opened 11 years ago Closed 10 years ago

B2G mochitests intermittently just stop after "IMediaResourceManagerDeathNotifier: media server died" and/or "ServiceManager: service 'media.resource_manager' died"

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: RyanVM, Unassigned)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 obsolete file)

Worst of all, the error only shows up in the dump, so TBPL can't suggest anything for it.

https://tbpl.mozilla.org/php/getParsedLog.php?id=26722508&tree=B2g-Inbound

b2g_emulator_vm b2g-inbound opt test mochitest-3 on 2013-08-19 08:37:33 PDT for push 24fbe2665d27
slave: tst-linux64-ec2-108

09:15:48     INFO -  08-19 16:08:39.423   699   699 I GeckoDump: 15720 INFO TEST-PASS | /tests/content/smil/test/test_smilMappedAttrFromTo.xhtml | image-rendering: checking that 'from' value is set at start of animation
09:15:48     INFO -  08-19 16:08:39.433   699   699 I GeckoDump: 15721 INFO TEST-PASS | /tests/content/smil/test/test_smilMappedAttrFromTo.xhtml | image-rendering: checking value halfway through animation
09:15:48     INFO -  08-19 16:14:09.692   699  1015 W IMediaResourceManagerDeathNotifier: media server died
09:15:48     INFO -  08-19 16:14:09.692    33    33 I ServiceManager: service 'media.resource_manager' died
https://tbpl.mozilla.org/php/getParsedLog.php?id=26725885&tree=B2g-Inbound
Summary: B2G mochitests intermittently just stop (after "IMediaResourceManagerDeathNotifier: media server died") → B2G mochitests intermittently just stop after "IMediaResourceManagerDeathNotifier: media server died" and/or "ServiceManager: service 'media.resource_manager' died"
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #0)
> 09:15:48     INFO -  08-19 16:14:09.692   699  1015 W
> IMediaResourceManagerDeathNotifier: media server died
> 09:15:48     INFO -  08-19 16:14:09.692    33    33 I ServiceManager:
> service 'media.resource_manager' died

The above says that b2g process is died by some reason during when video codec is instantiating. On emulator, sw video codec is created.
Sotaro, can you please suggest an assignee for this bug?

https://tbpl.mozilla.org/php/getParsedLog.php?id=26959050&full=1&branch=b2g-inbound
Flags: needinfo?(sotaro.ikeda.g)
Depends on: 907745
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #15)
> Sotaro, can you please suggest an assignee for this bug?
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=26959050&full=1&branch=b2g-
> inbound

I take this bug. Before investigating the problem, Bug 907745 needs to be fixed. The bug seems to be fixed in near future.
Assignee: nobody → sotaro.ikeda.g
Depends on: 917252
I still haven't been able to figure out the interleaving, but figured I might as well upload the fix to the timeout error message while I was thinking about it.
Attachment #809412 - Flags: review?(jhammel)
Comment on attachment 809412 [details] [diff] [review]
Patch 1.0 - fix format errors in the timeout error message

Beh, sorry. I meant to upload this to bug 917252
Attachment #809412 - Attachment is obsolete: true
Attachment #809412 - Flags: review?(jhammel)
https://tbpl.mozilla.org/php/getParsedLog.php?id=30163533&tree=Mozilla-B2g26-v1.2

Andrew, this is probably the most frequent B2G emulator mochitest failure we hit and there hasn't been any action on it in months. Can we please get someone to look at this?
Flags: needinfo?(overholt)
It looks like mochitests may be randomly running out of memory.  In each of the failing lots, I see lots of lines like:

log     : OOM Message Logger Started
Bug 929005 might be related to this bug. but it is not clear.
Sotaro's the best person to investigate here.  Let's hope the work in bug 929005 helps.
Flags: needinfo?(overholt)
Depends on: 929005
Bug 935981 might be related. But it is not clear.
Depends on: 935981
The following log just say b2g process was died by some reason. It has basically no relation ship to media.
 > ServiceManager: service 'media.resource_manager' died
If there is no crash log and b2g process was killed by some reason, a possible cause is 'killed by low memory killer' or 'killed by out of memory killer'.
Many of the logs being pasted here are now no longer related the the issue in the summary.

I've filed bug 937684 for ensuring the "TEST-UNEXPECTED-FAIL | /foo/bar/baz.html | application timed out after 330.0 seconds with no output" lines don't get interleaved, and thus we have more to star against, and this hopefully avoid false positives here.
Depends on: 937684
https://tbpl.mozilla.org/php/getParsedLog.php?id=30843693&tree=Mozilla-Inbound&full=1

Sotaro, it seems unclear to me where we go next. Any ideas?
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #95)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=30843693&tree=Mozilla-
> Inbound&full=1
> 
> Sotaro, it seems unclear to me where we go next. Any ideas?

Need to make clear if Comment 82 happens.
Flags: needinfo?(sotaro.ikeda.g)
dhylands, is it possible to output "adb shell dmesg" log?
Flags: needinfo?(dhylands)
So we recently added oom-msg-logger for on-device which will look for OOM related messages. These will then get captured and placed into the logcat log.

Currently, this only works on phones which our init.b2g.rc gets included with the kernel and we flash the kernel.

It looks like the OOM Message Logger is, in fact, trying to run. I see these near the end of the log:

10:55:35     INFO -  11-08 18:54:24.896  1288  1288 I log     : OOM Message Logger Started

We discovered that it really needs to run as root, and this was fixed just recently in Bug 939370

Looking at the more recent logs (after Nov 15) I see that the OOM Message Logger is running, but I'm not seeing any output from it, so it would appear that the media server isn't dying due to the low memory OOM killer.

Looking at the logs:

02:06:10     INFO -  11-25 10:04:23.585   702   702 I GeckoDump: 18058 INFO TEST-START | Shutdown
02:06:10     INFO -  11-25 10:04:23.595   702   702 I GeckoDump: 18059 INFO Passed:  31704
02:06:10     INFO -  11-25 10:04:23.595   702   702 I GeckoDump: 18060 INFO Failed:  0
02:06:10     INFO -  11-25 10:04:23.610   702   702 I GeckoDump: 18061 INFO Todo:    578
02:06:10     INFO -  11-25 10:04:23.625   702   702 I GeckoDump: 18062 INFO Slowest: 164922ms - /tests/dom/tests/mochitest/ajax/jquery/test_jQuery.html
02:06:10     INFO -  11-25 10:04:23.655   702   702 I GeckoDump: 18063 INFO SimpleTest FINISHED
02:06:10     INFO -  11-25 10:04:23.675   702   702 I GeckoDump: 18064 INFO TEST-INFO | Ran 1 Loops
02:06:10     INFO -  11-25 10:04:23.675   702   702 I GeckoDump: 18065 INFO SimpleTest FINISHED
02:06:10     INFO -  11-25 10:04:26.505   662   662 I Gecko:ProcessPriorityManager: [Mochitest, child-id=1, pid=702] Scheduling reset timer to fire in 1000ms.
02:06:10     INFO -  11-25 10:04:26.535   662   683 E Profiler: BPUnw: [1 total] thread_unregister_for_profiling(me=0x22b10)  (NOT REGISTERED)
02:06:10     INFO -  11-25 10:04:26.625   662   827 E Profiler: BPUnw: [1 total] thread_unregister_for_profiling(me=0x22fe0)  (NOT REGISTERED)
02:06:10     INFO -  11-25 10:04:26.745   662   662 I Gecko:ProcessPriorityManager: [Mochitest, child-id=1, pid=702] ScheduleResetPriority bailing; the timer is already running.
02:06:10     INFO -  11-25 10:04:26.755   662   662 I Gecko:ProcessPriorityManager: Making background LRU pool with size(31)
02:06:10     INFO -  11-25 10:04:26.755   662   662 I Gecko:ProcessPriorityManager: [child-id=1, pid=-1] Destroying ParticularProcessPriorityManager.
02:06:10     INFO -  11-25 10:04:28.626    33    33 I ServiceManager: service 'media.resource_manager' died

It looks like the tests all completed successfully, and that mochitest was in the the process of shutting down when the mediaserver died.

So maybe something was opened but not closed and that then caused the media server to die?
Flags: needinfo?(dhylands)
Thanks for the comment! 'media.resource_manager' are not media server. It is instantiated in b2g process. Do you know if b2g process is closed at these timing? I recognized that these services Shutdown do not implemented yet. If b2g process is shutdown, but service manager is still in active, the service manager detects 'media.resource_manager'  died.
In the most recent instances of all these failures, the tests are passing and completing successfully, but for some reason the mozharness script isn't reporting this correctly.  This looks like a stdout buffering/threading problem, e.g., 

13:55:41     INFO -  39652 INFO TEST-START | Shutdown
13:55:41     INFO -  39653 INFO Passed:  37433
13:56:52     INFO -  39654 Mochitest INFO | runtestsb2g.py | Running tests: end.
13:57:53     INFO -  INFO Failed:  0
13:57:53     INFO -  39655 INFO Todo:    157
13:57:53     INFO -  39656 INFO Slowest: 57602ms - /tests/content/html/content/test/forms/test_input_typing_sanitization.html
13:57:53     INFO -  39657 INFO SimpleTest FINISHED
13:57:53     INFO -  39658 INFO TEST-INFO | Ran 1 Loops
13:57:53     INFO -  39659 INFO SimpleTest FINISHED
Assignee: sotaro.ikeda.g → nobody
We haven't seen this in a while.  I think this message indicated a crash, which is now being reported correctly, and so we can close this bug.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: