Perma [tier2] [fission] awsy/test_memory_usage.py TestMemoryUsage.test_open_tabs | AssertionError: Checkpoint was recorded
Categories
(Testing :: AWSY, defect, P3)
Tracking
(Fission Milestone:M6b, firefox79 disabled, firefox80 disabled)
Fission Milestone | M6b |
People
(Reporter: intermittent-bug-filer, Assigned: mccr8)
References
(Regression)
Details
(Keywords: assertion, intermittent-failure)
Attachments
(1 file)
Filed by: archaeopteryx [at] coole-files.de
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=303938963&repo=try
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/P-cMCTMtR6WYTqmbvekTlQ/runs/0/artifacts/public/logs/live_backing.log
[task 2020-05-27T14:37:28.486Z] 14:37:28 INFO - starting checkpoint TabsClosedExtraProcesses...
[task 2020-05-27T14:37:28.720Z] 14:37:28 INFO - checkpoint created, stored in Z:\task_1590589260\build\tests\results\memory-report-TabsClosedExtraProcesses-0.json.gz
[task 2020-05-27T14:37:28.721Z] 14:37:28 INFO - closing preloaded browser
[task 2020-05-27T14:37:28.727Z] 14:37:28 INFO - starting checkpoint TabsClosed...
[task 2020-05-27T14:37:28.956Z] 14:37:28 INFO - checkpoint created, stored in Z:\task_1590589260\build\tests\results\memory-report-TabsClosed-0.json.gz
[task 2020-05-27T14:37:58.961Z] 14:37:58 INFO - starting checkpoint TabsClosedSettled...
[task 2020-05-27T14:37:58.962Z] 14:37:58 ERROR - Unexpected error: <type 'exceptions.IOError'>
[task 2020-05-27T14:37:58.963Z] 14:37:58 INFO - TEST-UNEXPECTED-FAIL | awsy/test_memory_usage.py TestMemoryUsage.test_open_tabs | AssertionError: Checkpoint was recorded
[task 2020-05-27T14:37:58.963Z] 14:37:58 INFO - Traceback (most recent call last):
[task 2020-05-27T14:37:58.963Z] 14:37:58 INFO - File "z:\task_1590589260\build\venv\lib\site-packages\marionette_harness\marionette_test\testcases.py", line 159, in run
[task 2020-05-27T14:37:58.963Z] 14:37:58 INFO - testMethod()
[task 2020-05-27T14:37:58.963Z] 14:37:58 INFO - File "Z:\task_1590589260\build\tests\awsy\awsy\test_memory_usage.py", line 237, in test_open_tabs
[task 2020-05-27T14:37:58.964Z] 14:37:58 INFO - create_checkpoint("TabsClosedSettled", itr)
[task 2020-05-27T14:37:58.964Z] 14:37:58 INFO - File "Z:\task_1590589260\build\tests\awsy\awsy\test_memory_usage.py", line 201, in create_checkpoint
[task 2020-05-27T14:37:58.964Z] 14:37:58 INFO - self.assertIsNotNone(checkpoint, "Checkpoint was recorded")
[task 2020-05-27T14:37:58.964Z] 14:37:58 INFO - TEST-INFO took 586297ms
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•4 years ago
|
Comment 8•4 years ago
|
||
:cpeterson, would someone from the fission team be able to look into this? This is only failing on fission tests.
Comment 9•4 years ago
|
||
(In reply to Greg Mierzwinski [:sparky] from comment #8)
:cpeterson, would someone from the fission team be able to look into this? This is only failing on fission tests.
Thanks for the heads up. I'll send this bug to Fission triage for review.
Looks like the test has been failing 10-20 times a week, but only on windows10-64-shippable-qr.
Test failure from https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=309610455&repo=mozilla-central&lineNumber=1106
[task 2020-07-12T11:36:51.595Z] 11:36:51 INFO - starting checkpoint Start...
[task 2020-07-12T11:36:53.297Z] 11:36:53 ERROR - Checkpoint JavaScript error: JavaScriptError: Document was unloaded
[task 2020-07-12T11:36:53.297Z] 11:36:53 ERROR - stacktrace:
[task 2020-07-12T11:36:53.297Z] 11:36:53 ERROR - WebDriverError@chrome://marionette/content/error.js:175:5
[task 2020-07-12T11:36:53.297Z] 11:36:53 ERROR - JavaScriptError@chrome://marionette/content/error.js:354:5
[task 2020-07-12T11:36:53.297Z] 11:36:53 ERROR - evaluate.sandbox/promise</unloadHandler<@chrome://marionette/content/evaluate.js:132:20
[task 2020-07-12T11:36:53.297Z] 11:36:53 ERROR -
[task 2020-07-12T11:36:53.327Z] 11:36:53 WARNING - Failed to gather test failure debug: Browsing context has been discarded
[task 2020-07-12T11:36:53.327Z] 11:36:53 WARNING - stacktrace:
[task 2020-07-12T11:36:53.327Z] 11:36:53 WARNING - WebDriverError@chrome://marionette/content/error.js:175:5
[task 2020-07-12T11:36:53.327Z] 11:36:53 WARNING - NoSuchWindowError@chrome://marionette/content/error.js:409:5
[task 2020-07-12T11:36:53.327Z] 11:36:53 WARNING - assert.that/<@chrome://marionette/content/assert.js:479:13
[task 2020-07-12T11:36:53.327Z] 11:36:53 WARNING - assert.open@chrome://marionette/content/assert.js:191:4
[task 2020-07-12T11:36:53.327Z] 11:36:53 WARNING - GeckoDriver.prototype.takeScreenshot@chrome://marionette/content/driver.js:3034:20
[task 2020-07-12T11:36:53.327Z] 11:36:53 WARNING - despatch@chrome://marionette/content/server.js:305:40
[task 2020-07-12T11:36:53.327Z] 11:36:53 WARNING - execute@chrome://marionette/content/server.js:275:16
[task 2020-07-12T11:36:53.328Z] 11:36:53 WARNING - onPacket/<@chrome://marionette/content/server.js:248:20
[task 2020-07-12T11:36:53.328Z] 11:36:53 WARNING - onPacket@chrome://marionette/content/server.js:249:9
[task 2020-07-12T11:36:53.328Z] 11:36:53 WARNING - _onJSONObjectReady/<@chrome://marionette/content/transport.js:501:20
[task 2020-07-12T11:36:53.328Z] 11:36:53 WARNING -
[task 2020-07-12T11:36:53.328Z] 11:36:53 INFO - TEST-UNEXPECTED-FAIL | awsy/test_memory_usage.py TestMemoryUsage.test_open_tabs | AssertionError: Checkpoint was recorded
[task 2020-07-12T11:36:53.329Z] 11:36:53 INFO - Traceback (most recent call last):
[task 2020-07-12T11:36:53.329Z] 11:36:53 INFO - File "z:\task_1594552757\build\venv\lib\site-packages\marionette_harness\marionette_test\testcases.py", line 196, in run
[task 2020-07-12T11:36:53.329Z] 11:36:53 INFO - testMethod()
[task 2020-07-12T11:36:53.329Z] 11:36:53 INFO - File "Z:\task_1594552757\build\tests\awsy\awsy\test_memory_usage.py", line 206, in test_open_tabs
[task 2020-07-12T11:36:53.329Z] 11:36:53 INFO - create_checkpoint("Start", 0)
[task 2020-07-12T11:36:53.329Z] 11:36:53 INFO - File "Z:\task_1594552757\build\tests\awsy\awsy\test_memory_usage.py", line 201, in create_checkpoint
[task 2020-07-12T11:36:53.329Z] 11:36:53 INFO - self.assertIsNotNone(checkpoint, "Checkpoint was recorded")
[task 2020-07-12T11:36:53.329Z] 11:36:53 INFO - TEST-INFO took 21099ms
Comment 10•4 years ago
|
||
This is perma-failing on tier2 starting with this merge https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=d4c6cd2e13bb057eb4d5f8c579f440936bf17241
Comment 11•4 years ago
|
||
Andrew, can you please take a look this AWSY timeout?
kmag says the test is probably timing out because the memory report generation is taking too long.
This bug is a Fission Nightly blocker. We'd like to fix this soon, so tracking for milestone M6b.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 12•4 years ago
|
||
This is Windows-only which is odd. Also, when it was working generating the report was only taking 9 seconds, and after the failure it isn't finishing even after a minute so something must be going pretty wrong.
Assignee | ||
Comment 13•4 years ago
|
||
The gecko.log files look similar until you start getting a ton of these:
###!!! [Child][MessageChannel] Error: (msgtype=0x390118,name=PContent::Msg_AddMemoryReport) Channel closing: too late to send/recv, messages will be lost
This indicates that the process has already started to shut down by the time many of the reporters are able to report.
Assignee | ||
Comment 14•4 years ago
|
||
(In reply to Andreea Pavel [:apavel] from comment #10)
This is perma-failing on tier2 starting with this merge https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=d4c6cd2e13bb057eb4d5f8c579f440936bf17241
I've done a ton of retriggers, and I think that isn't quite right. I think the failure rate has just been very high for awhile. For instance, here's some retriggers from a push on Sunday: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=shippable%2Copt%2Care%2Cwe%2Cslim%2Cyet%2Ctests%2Cby%2Ctaskcluster%2C%2Cfission%2Cenabled%2Csy-tp6&revision=22f5f7e914449bde3bee21a322d67248cd5e0254
Here's some retriggers from a push on autoland, from before that merge: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=windows%2C10%2Cshippable%2Copt%2Care%2Cwe%2Cslim%2Cyet%2Ctests%2Cby%2Ctaskcluster%2C%2Cfission%2Cenabled%2Csy-tp6&revision=d080e46635d3e8b163427f68282611aa6392e59f
It is also possible that there was some infrastructure change not reflected in the tree that has caused the failure rate to go up, but I'm not sure how I'd be able to figure that out.
Assignee | ||
Comment 15•4 years ago
|
||
The failure rate is so high on this test, I think it would be okay to disable it in Windows+Fission until I fix this problem. We don't seem to hit the failures on Linux, so I'd like to leave it on there.
I found a few things that AWSY could do better (which I'll file bugs for if I think they are worthwhile), but my guess is that this is a regression from bug 1602757. The timing lines up perfectly, too. This theory is based on my observation that one or more child processes are failing to send their reports. What could cause a child process to shut down before the report is done, if the test harness is just sitting around waiting for the reports?
Well...
- Before this specific AWSY check point, it calls do_full_gc(), which notifies "child-mmu-request" observers.
- This in turn sends a message to every child process, which causes all of them to notify "memory-pressure" observers with the "heap-minimize" value.
- One of the observers is the preallocated process manager. What does it do when it finds out we're under memory pressure? Why, it shuts down the process, of course!
Presumably there's some kind of race between the parent process trying to collect reports from every child and the preallocated process telling the parent that it is shutting down. I'll try to confirm this theory with some try pushes and then figure out what the best fix is here.
Comment 16•4 years ago
|
||
:alexandrui, could you disable this awsy fission test on windows temporarily?
Comment hidden (Intermittent Failures Robot) |
Updated•4 years ago
|
Comment 18•4 years ago
|
||
Updated•4 years ago
|
Comment 19•4 years ago
|
||
Pushed by apavel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d932b3b69adc disable test_memory_usage.py on win fission r=sparky
Comment 20•4 years ago
|
||
Pushed by apavel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3e4e8032b9e0 delete extra space and add missing one on a CLOSED TREE
Comment 21•4 years ago
|
||
Pushed by apavel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e5f48da8a203 delete tab spacing on a CLOSED TREE
Comment 22•4 years ago
|
||
bugherder |
Assignee | ||
Comment 23•4 years ago
|
||
I guess my analysis in comment 15 isn't right. After a bit of a detour to get a browser chrome test for about:memory, it appears that what happens normally, at least on OSX, is that if you kill a process while a memory report is underway then the BrowserParent for the child process ends up dying. This clears the mMemoryReportRequest field, which makes us run ~MemoryReportRequestHost(), which calls nsMemoryReporterManager::EndProcessReport, which then makes the manager stop waiting for a memory report from that process.
So there must be something subtler going on in the case of the Windows failure. Maybe the ContentParent isn't going away due to something about the way Marionette is running code or the specific way that the GC is invoked.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 25•4 years ago
|
||
I have reenabled this test in bug 1655536 and it is working again: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=awsy%2Cwin&revision=fdfd1e91d204eac485f83445935ff69c283b9b8d
Updated•4 years ago
|
Description
•