Closed
Bug 775227
Opened 12 years ago
Closed 5 years ago
OutOfMemory or "out of memory" during mochitest
Categories
(Testing :: General, defect)
Tracking
(Not tracked)
RESOLVED
INACTIVE
People
(Reporter: gbrown, Assigned: gbrown)
References
Details
(Keywords: intermittent-failure, Whiteboard: [test disabled][leave open])
Attachments
(4 files, 1 obsolete file)
1.44 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
9.13 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
1.60 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
2.80 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
Many of the logs reported in bug 747787 - crash during android mochitests - show "out of memory" or "OutOfMemoryException" in the logcats:
https://tbpl.mozilla.org/php/getParsedLog.php?id=12729578&tree=Ionmonkey contains
I/GeckoDump( 1710): 82913 INFO TEST-START | /tests/layout/style/test/test_value_cloning.html
I/GeckoScreenshot( 1710): rect: -0.316666, 11.000000, 376.549988, 56800.683594
I/GeckoTabs( 1710): Got message: SessionHistory:New
I/GeckoTabs( 1710): Got message: SessionHistory:Purge
I/GeckoScreenshot( 1710): rect: 207.683334, 13.000000, 513.000000, 507.683350
I/GeckoScreenshot( 1710): rect: 0.000000, 0.000000, 500.000000, 300.000000
I/GeckoScreenshot( 1710): rect: 0.000000, 0.000000, 300.000000, 150.000000
I/dalvikvm-heap( 1710): Clamp target GC heap from 16.109MB to 16.000MB
I/dalvikvm-heap( 1710): Forcing collection of SoftReferences for 923572-byte allocation
I/dalvikvm-heap( 1710): Clamp target GC heap from 16.081MB to 16.000MB
E/dalvikvm-heap( 1710): Out of memory on a 923572-byte allocation.
I/WindowManager( 1017): WIN DEATH: Window{445f2f40 org.mozilla.fennec/org.mozilla.fennec.App paused=false}
I/ActivityManager( 1017): Process org.mozilla.fennec (pid 1710) has died.
I/WindowManager( 1017): WIN DEATH: Window{4463b258 SurfaceView paused=false}
https://tbpl.mozilla.org/php/getParsedLog.php?id=12685785&tree=Mozilla-Inbound contains
I/GeckoDump( 1716): 30732 INFO TEST-PASS | /tests/layout/style/test/test_parse_rule.html | .a::before {content: 'This is \a'}#a ::before {content: 'FAIL'} - "This is \A " should equal "This is \A "
I/GeckoTabs( 1716): Got message: SessionHistory:New
I/GeckoTabs( 1716): Got message: SessionHistory:Purge
E/GeckoConsole( 1716): [JavaScript Warning: "Unexpected end of file while searching for ';' or '}' to end declaration. Unexpected end of file while searching for closing } of declaration block." {file: "http://mochi.test:8888/tests/layout/style/test/test_parse_rule.html" line: 1}]
I/GeckoScreenshot( 1716): rect: 217.683334, 23.000000, 323.000000, 367.683350
I/GeckoScreenshot( 1716): rect: 10.000000, 10.000000, 310.000000, 160.000000
I/GeckoScreenshot( 1716): rect: 0.000000, 0.000000, 300.000000, 150.000000
I/GeckoDump( 1716): 30733 INFO TEST-PASS | /tests/layout/style/test/test_parse_rule.html | #a::before {content: 'This is \a - "This is \A " should equal "This is \A "
I/GeckoDump( 1716): 30734 INFO TEST-PASS | /tests/layout/style/test/test_parse_rule.html | Sanity check for rule: div {background: blue} - transparent should not equal rgb(0, 0, 255)
I/GeckoTabs( 1716): Got message: SessionHistory:New
I/GeckoTabs( 1716): Got message: SessionHistory:Purge
I/GeckoScreenshot( 1716): rect: 217.683334, 23.000000, 323.000000, 367.683350
I/GeckoScreenshot( 1716): rect: 10.000000, 10.000000, 310.000000, 160.000000
I/GeckoScreenshot( 1716): rect: 0.000000, 0.000000, 300.000000, 150.000000
I/dalvikvm-heap( 1716): Forcing collection of SoftReferences for 1486108-byte allocation
E/dalvikvm-heap( 1716): Out of memory on a 1486108-byte allocation.
E/GeckoAppShell( 1716): >>> REPORTING UNCAUGHT EXCEPTION FROM THREAD 9 ("GeckoBackgroundThread")
E/GeckoAppShell( 1716): java.lang.OutOfMemoryError
E/GeckoAppShell( 1716): at java.nio.ByteBuffer.put(ByteBuffer.java:815)
E/GeckoAppShell( 1716): at org.mozilla.gecko.gfx.ScreenshotLayer$ScreenshotImage.copyBuffer(ScreenshotLayer.java:138)
E/GeckoAppShell( 1716): at org.mozilla.gecko.gfx.ScreenshotLayer$ScreenshotImage.setBitmap(ScreenshotLayer.java:144)
E/GeckoAppShell( 1716): at org.mozilla.gecko.gfx.ScreenshotLayer.setBitmap(ScreenshotLayer.java:53)
E/GeckoAppShell( 1716): at org.mozilla.gecko.gfx.LayerRenderer.setCheckerboardBitmap(LayerRenderer.java:138)
E/GeckoAppShell( 1716): at org.mozilla.gecko.ScreenshotHandler$1.run(GeckoAppShell.java:2338)
E/GeckoAppShell( 1716): at android.os.Handler.handleCallback(Handler.java:587)
E/GeckoAppShell( 1716): at android.os.Handler.dispatchMessage(Handler.java:92)
E/GeckoAppShell( 1716): at android.os.Looper.loop(Looper.java:123)
E/GeckoAppShell( 1716): at org.mozilla.gecko.GeckoBackgroundThread.run(GeckoBackgroundThread.java:31)
I/WindowManager( 1017): WIN DEATH: Window{4444b6d0 SurfaceView paused=false}
I/ActivityManager( 1017): Process org.mozilla.fennec (pid 1716) has died.
I/WindowManager( 1017): WIN DEATH: Window{44421f10 org.mozilla.fennec/org.mozilla.fennec.App paused=false}
Assignee | ||
Comment 1•12 years ago
|
||
Bug 766584 shows OOM in:
/tests/layout/style/test/test_transitions_computed_values.html,
/tests/layout/style/test/test_transitions_and_zoom.html, and
/tests/layout/style/test/test_value_storage.html
...all during screenshotting.
Assignee | ||
Comment 2•12 years ago
|
||
Bug shows 775053 OOM in:
/tests/dom/tests/mochitest/bugs/test_bug260264_nested.html
Assignee | ||
Comment 3•12 years ago
|
||
Despite the various stack traces showing OOM during a screenshot, the problem may well be elsewhere. Consider:
- bug 757468 also had stack traces showing OOM during a screenshot, but the problem was elsewhere
- screenshots are one of the largest allocations made -- if there is a memory leak on a smaller allocation somewhere, we would still expect to see most OOM's reported in the screenshot
- the pageload-stress test that gave rise to bug 757468 can now run for over 1000 page loads (ie, many screenshots) without OOM
- OOM failures during mochitests have persisted despite many significant changes to screenshot implementation
Assignee | ||
Comment 4•12 years ago
|
||
I ran mochitests locally, with TEST_PATH=layout/style/test and monitored heap usage with ddms. After a few minutes, I started seeing logcat messages:
I/ActivityManager( 3098): Low Memory: No more background processes.
E/GeckoApp(13688): low memory
and noted Android killing off many processes (but fennec and sutagent survived). ddms reported the fennec heap size as about 9M (5 alloc/4 free); sutagent about 4M (3/1). ddms sysinfo / memory (PSS in KB graph) shows 75% "unknown". Interesting...where is the memory being used?
Assignee | ||
Comment 5•12 years ago
|
||
The test results in Comment 4 apply to the Galaxy S (Froyo).
I repeated the same test on a Galaxy Nexus (ICS):
- no low memory warnings in logcat
- no processes killed
- ddms sysinfo / memory (PSS in KB graph) shows fennec using about 20% of the graph
- ddms reports heap size at about 18M (15 alloc/3 free)
Assignee | ||
Comment 6•12 years ago
|
||
I repeated the same test on a local Tegra. There were no memory warnings in the logcat, 130000+ test results were reported, and then the Tegra rebooted!
Assignee | ||
Comment 7•12 years ago
|
||
Summary of additional test results from Galaxy S for layout/style/test:
- Low Memory persists for a variety of conditions including:
- screenshots disabled
- favicon bitmap creation disabled
- SUT or ADB devicemanagers
- typically, the Android low memory killer kills off a significant number of processes; usually that includes the provider process; sometimes the sutagent is also killed
- ** Low Memory does not occur and layout/style/test mochitests run to completion without error if test_transitions_per_property.html is disabled **
Try run without test_transitions_per_property: https://tbpl.mozilla.org/?tree=Try&rev=318ebf99823d ...
Assignee | ||
Comment 8•12 years ago
|
||
Assignee | ||
Comment 9•12 years ago
|
||
I am having trouble demonstrating an improvement to mochitest runs on try (other tests fail), but my local results are consistent, so I would like to check this in to disable test_transitions_per_property while I investigate other mochitest errors.
Attachment #644509 -
Attachment is obsolete: true
Attachment #644810 -
Flags: review?(jmaher)
Comment 10•12 years ago
|
||
Comment on attachment 644810 [details] [diff] [review]
disable test_transitions_per_property to avoid OOM
Review of attachment 644810 [details] [diff] [review]:
-----------------------------------------------------------------
good stuff, make sure we leave this bug open with the whiteboard as: [orange][test disabled][leave open]
Attachment #644810 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 11•12 years ago
|
||
Whiteboard: [orange][test disabled][leave open]
Comment 13•12 years ago
|
||
Assignee | ||
Comment 14•12 years ago
|
||
I have found a few more tests that cause Low Memory / OOM / tegra reboot -- I will try to complete a comprehensive list today.
Assignee | ||
Comment 15•12 years ago
|
||
This is taking longer than anticipated...I have identified about 20 OOM tests now.
Assignee | ||
Comment 16•12 years ago
|
||
With these additional tests disabled, I can run all of the mochitests - in 8 chunks - on my Galaxy S with good reliability (3 straight runs for each chunk without test failure due to OOM). Even so, there are still signs of low memory -- the Android low memory killer often kills off a few processes during the tests, but not so aggressively that sutagent or fennec are killed.
Results from Try are inconclusive. I have only managed 6 runs of M3 on try; of these, 3 were green, 1 failed due to reboot, 1 failed possibly due to reboot, and 1 failed for an unrelated reason.
Attachment #646221 -
Flags: review?(jmaher)
Comment 17•12 years ago
|
||
Comment on attachment 646221 [details] [diff] [review]
disable 18 additional tests
Review of attachment 646221 [details] [diff] [review]:
-----------------------------------------------------------------
::: testing/mochitest/android.json
@@ +33,5 @@
> "content/base/test/websocket_hybi/test_receive-arraybuffer.html": "",
> "content/base/test/websocket_hybi/test_receive-blob.html": "",
> "content/base/test/websocket_hybi/test_send-arraybuffer.html": "",
> "content/base/test/websocket_hybi/test_send-blob.html": "",
> + "content/canvas/test/webgl/test_webgl_conformance_test_suite.html": "bug 775227",
this specific test has it's own manifest with rules to ignore tests on each platform. I would like somebody who owns this test to be aware of the memory impact this test has.
Attachment #646221 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 18•12 years ago
|
||
Assignee | ||
Comment 19•12 years ago
|
||
(In reply to Joel Maher (:jmahe
> this specific test has it's own manifest with rules to ignore tests on each
> platform. I would like somebody who owns this test to be aware of the
> memory impact this test has.
I have opened additional bugs to track progress on re-enabling each test and draw attention to the memory issue. See bugs 778003,6,8,9,10,11,14,16,17,18,19.
Updated•12 years ago
|
Comment 20•12 years ago
|
||
Assignee | ||
Comment 21•12 years ago
|
||
Some tests continue to fail on tbpl. I notice this pattern on many M3 runs:
1996 INFO TEST-START | /tests/dom/imptests/webapps/WebStorage/tests/submissions/Infraware/test_event_session_oldvalue.html
1997 INFO TEST-PASS | /tests/dom/imptests/webapps/WebStorage/tests/submissions/Infraware/test_event_session_oldvalue.html | Elided 3 passes or known failures.
1998 INFO TEST-END | /tests/dom/imptests/webapps/WebStorage/tests/submissions/Infraware/test_event_session_oldvalue.html | finished in 1060ms
1999 INFO TEST-START | /tests/dom/imptests/webapps/WebStorage/tests/submiss
command timed out: 2400 seconds without output, killing pid 2496
Comment 22•12 years ago
|
||
ooh, there is a bug on it:
https://bugzilla.mozilla.org/show_bug.cgi?id=772632
I find it disturbing how out-of-band the information that these tests are disabled on android is. Normally tests are disabled either with javascript that lives inside the test, or by conditionally listing the test in the makefile. Listing these in a central place makes it hard to discover that a particular test is disabled. Could one of the other mechanisms be used instead?
Comment 24•12 years ago
|
||
I find the makefiles and inside the tests themselves out of band. The tests themselves read android.json as the primary source for what to run. Personally I find a single file much easier to read when figuring out what is disabled than doing a few different code greps to try to figure out what is disabled. This is really similar to how reftests are done.
I will admit, maybe I don't know the most ideal method for doing this. If there are other examples of tests that are disabled maybe I will see the more streamlined way.
The primary source for what to run is the makefiles that list the tests and install them, just like in reftest the primary source is the reftest.list files. (I prefer the reftest.list approach too, but that's not how we do mochitests.)
I actually wasn't aware (as style system module owner) that a bunch of the style system tests were disabled on Android until bug 778018 and bug 778019 were filed notifying me that two *more* had been disabled on Android.
That I could modify the tests themselves and the list of tests to run without knowing that a bunch of the tests are disabled is what seems "out of band" to me. It would have been trivial for me (or anyone else) to copy a test file to create a new and similar test, add that test, and end up with Android bustage because there was no sign at any point in the process that the test I was copying from was disabled on some platforms (and the copying, or in fact a renaming of a test) wouldn't copy that information.
One additional thought (echoing bug 772740 comment 9, which has a patch to do this): if memory use is the problem, perhaps we should GC between tests so that memory use is more deterministic (and we don't end up disabling, say, the test after the one that's actually responsible for using a lot of memory).
(Our GC algorithms aren't tuned for running mochitests, nor should they be; they should be tuned for what our users do.)
Updated•12 years ago
|
Assignee | ||
Comment 27•12 years ago
|
||
This disables one more test, in android.json.
The test is in dom/imptests, identified in bug 772632 as troublesome in general. Personally I'm seeing this test in particular causing failures, so I prefer to disable just this one test, rather than the whole directory, as suggested in bug 772632...at least for now.
IMO :dbaron makes a good case for disabling tests using a different mechanism, but I am reluctant to change mid-bug, so this patch also uses android.json, just like the other patches on this bug.
Attachment #647652 -
Flags: review?(jmaher)
Comment 28•12 years ago
|
||
Comment on attachment 647652 [details] [diff] [review]
disable test_event_session_key.html
Review of attachment 647652 [details] [diff] [review]:
-----------------------------------------------------------------
thanks. regarding the place to disable, I think that is a larger meta issue we need to figure out.
Attachment #647652 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 29•12 years ago
|
||
Assignee | ||
Comment 30•12 years ago
|
||
That last patch didn't have the effect I wanted it to on m-i. I may have confused test_storage_local_key and test_event_session_key...
Comment 31•12 years ago
|
||
Assignee | ||
Comment 32•12 years ago
|
||
Enable test_browserElement_inproc_Iconchange.html -- no longer any sign of OOM.
Enable test_event_session_key.html, which I seemed to confuse with...
Disable test_storage_local_key.html, which is currently failing nearly 100% on tbpl.
Once this is done, I still anticipate Android M3 failures in dom/indexedDB/ipc/test_ipc.html -- bug 770046 -- and possibly dom/indexedDB/test/test_event_propagation.html (need to look into that more).
Attachment #649046 -
Flags: review?(jmaher)
Comment 33•12 years ago
|
||
Comment on attachment 649046 [details] [diff] [review]
minor changes to android.json
Review of attachment 649046 [details] [diff] [review]:
-----------------------------------------------------------------
thanks!
Attachment #649046 -
Flags: review?(jmaher) → review+
Comment 34•12 years ago
|
||
Comment on attachment 649046 [details] [diff] [review]
minor changes to android.json
https://hg.mozilla.org/integration/mozilla-inbound/rev/b87232262212
Comment 35•12 years ago
|
||
Comment 36•12 years ago
|
||
Last 5 native m3 runs on inbound:
https://tbpl.mozilla.org/php/getParsedLog.php?id=14140899&tree=Firefox#error0
{
2330 INFO TEST-START | /tests/dom/imptests/webapps/WebStorage/tests/submissions/Ms2ger/test_storage_session_index_js.html
2331 INFO TEST-PASS | /tests/dom/imptests/webapps/WebStorage/tests/submissions/Ms2ger/test_stor
command timed out: 2400 seconds without output, killing pid 36288
}
https://tbpl.mozilla.org/php/getParsedLog.php?id=14145833&tree=Firefox#error0
{
2328 INFO TEST-END | /tests/dom/imptests/webapps/WebStorage/tests/submissions/Ms2ger/test_storage_session_index_js.html | finished in 592ms
2329 INFO TEST-START | /t
command timed out: 2400 seconds without output, killing pid 12344
}
https://tbpl.mozilla.org/php/getParsedLog.php?id=14146712&tree=Firefox#error0
{
2331 INFO TEST-END | /tests/dom/imptests/webapps/WebStorage/tests/submissions/Ms2ger/test_storage_session_length_js.html | finished in 578ms
2332 INFO TES
command timed out: 2400 seconds without output, killing pid 27397
}
https://tbpl.mozilla.org/php/getParsedLog.php?id=14150183&tree=Firefox#error0
{
2331 INFO TEST-END | /tests/dom/imptests/webapps/WebStorage/tests/submissions/Ms2ger/test_storage_session_length_js.html | finished in 1009ms
2332 INFO TEST-START | /tests/dom/imptests/webapps/WebStorage/tests/submissions/Ms2ger/test_storage_session_removeitem_js.html
2333 INFO TEST-PASS | /tests/dom/imptests/webapps/Web
command timed out: 2400 seconds without output, killing pid 2952
}
https://tbpl.mozilla.org/php/getParsedLog.php?id=14152659&tree=Firefox#error0
{
2331 INFO TEST-END | /tests/dom/imptests/webapps/WebStorage/tests/submissions/Ms2ger/test_storage_session_length_js.html | finished in 593ms
2332 INFO TEST-START | /tests/dom/imptests/webapps/WebStorage/tests/submissions/Ms2ger/test_s
command timed out: 2400 seconds without output, killing pid 54676
}
Comment 37•12 years ago
|
||
On native M1:
https://tbpl.mozilla.org/php/getParsedLog.php?id=14181377&tree=Mozilla-Inbound#error0
{
49129 INFO TEST-PASS | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/uniforms/uniform-location.html] Test passed - successfullyParsed is true
49130 INFO TEST-PASS | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/uniforms/uniform-location.html] All 27 test(s) passed
49131 INFO TEST-INFO | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/uniforms/uniform-samplers-test.html] (WebGL mochitest) Starting test page
49132 INFO TEST-PASS | /tests/content/canvas/test/webgl/test_webgl_conformance_test_suite.html | [conformance/unif
command timed out: 2400 seconds without output, killing pid 38107
process killed by signal 9
program finished with exit code -1
elapsedTime=3657.179387
TinderboxPrint: mochitest-plain<br/><em class="testfail">T-FAIL</em>
buildbot.slave.commands.TimeoutError: command timed out: 2400 seconds without output, killing pid 38107
}
Comment 38•12 years ago
|
||
looking at that specific log we timeout in mochitest at 2400 seconds, then the next step that we talk to the device (at the bottom of the log) is reboot.py and we timeout with 1800 seconds.
This is an indicator of a device that needs a PDU reboot or is fully dead.
Why we get into this state is unknown. I get into it locally when I get a dhcp renew, I have tested that on the production network that doesn't cause problems. That was a year and a half ago though...maybe the sutagent acts differently.
The big thing to know is:
1) is the device pingable?
2) is the sutagent accepting new connections?
If both are false, then we should take the device offline.
Comment 39•12 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #38)
> looking at that specific log we timeout in mochitest at 2400 seconds, then
> the next step that we talk to the device (at the bottom of the log) is
> reboot.py and we timeout with 1800 seconds.
>
> This is an indicator of a device that needs a PDU reboot or is fully dead.
>
> Why we get into this state is unknown.
This seems to be happening quite a bit on test_webgl_conformance_test_suite.html specifically though - are we sure the test isn't a contributing factor?
https://tbpl.mozilla.org/php/getParsedLog.php?id=14200997&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=14202596&tree=Mozilla-Inbound
Comment 40•12 years ago
|
||
the test could be a factor. I believe it could cause fennec to starve the memory of other applications (i.e. sut, system processes) and that could force us to hang.
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Whiteboard: [orange][test disabled][leave open] → [test disabled][leave open]
Assignee | ||
Comment 41•12 years ago
|
||
Dropping off as the assignee here as I am not finding time to check back on these.
Assignee: gbrown → nobody
Comment 42•11 years ago
|
||
Bug 885157 removed content/media/test/test_too_many_elements.html.
Assignee | ||
Updated•5 years ago
|
Assignee: nobody → gbrown
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → INACTIVE
You need to log in
before you can comment on or make changes to this bug.
Description
•