Closed Bug 1494573 Opened 2 years ago Closed 2 years ago

Perma windows7-32-msvc tresize/windows7 | Could not find report in browser output: [('tsformat', ('__start_report', '__end_report')), ('tpformat', ('__start_tp_report', '__end_tp_report'))]

Categories

(Testing :: Talos, defect, P1)

Version 3
defect

Tracking

(firefox64 fixed)

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: keeler)

References

Details

Attachments

(1 file)

Filed by: ebalazs [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=201859714&repo=mozilla-central

https://queue.taskcluster.net/v1/task/GmmnUNb0T0Cz_Lu_e1kHEg/runs/0/artifacts/public/logs/live_backing.log

TEST-INFO | started process 5976 (C:\Users\task_1538024168\build\application\firefox\firefox -wait-for-browser -no-deelevate -profile c:\users\task_1538024168\appdata\local\temp\tmpgw46po\profile http://localhost:49807/startup_test/tresize/addon/content/tresize-test.html)
05:03:01     INFO -  TEST-INFO | 5976: exit 1
05:03:08     INFO -  rmtree() failed for "('c:\\users\\task_1538024168\\appdata\\local\\temp\\tmpgw46po',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
05:03:08     INFO -  rmtree() failed for "('c:\\users\\task_1538024168\\appdata\\local\\temp\\tmpgw46po',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
05:03:08     INFO -  rmtree() failed for "('c:\\users\\task_1538024168\\appdata\\local\\temp\\tmpgw46po',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
05:03:08     INFO -  rmtree() failed for "('c:\\users\\task_1538024168\\appdata\\local\\temp\\tmpgw46po',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
05:03:08     INFO -  rmtree() failed for "('c:\\users\\task_1538024168\\appdata\\local\\temp\\tmpgw46po',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
05:03:08     INFO -  Exception while removing profile directory: c:\users\task_1538024168\appdata\local\temp\tmpgw46po
05:03:08     INFO -  [Error 32] The process cannot access the file because it is being used by another process: 'c:\\users\\task_1538024168\\appdata\\local\\temp\\tmpgw46po\\profile\\cert9.db'
05:03:08     INFO -  TEST-UNEXPECTED-ERROR | tresize | Could not find report in browser output: [('tsformat', ('__start_report', '__end_report')), ('tpformat', ('__start_tp_report', '__end_tp_report'))]
05:03:08    ERROR -  Traceback (most recent call last):
05:03:08     INFO -    File "C:\Users\task_1538024168\build\tests\talos\talos\run_tests.py", line 303, in run_tests
05:03:08     INFO -      talos_results.add(mytest.runTest(browser_config, test))
05:03:08     INFO -    File "C:\Users\task_1538024168\build\tests\talos\talos\ttest.py", line 63, in runTest
05:03:08     INFO -      return self._runTest(browser_config, test_config, setup)
05:03:08     INFO -    File "C:\Users\task_1538024168\build\tests\talos\talos\ttest.py", line 277, in _runTest
05:03:08     INFO -      else None)
05:03:08     INFO -    File "C:\Users\task_1538024168\build\tests\talos\talos\results.py", line 95, in add
05:03:08     INFO -      global_counters=self.global_counters
05:03:08     INFO -    File "C:\Users\task_1538024168\build\tests\talos\talos\results.py", line 326, in __init__
05:03:08     INFO -      self.parse()
05:03:08     INFO -    File "C:\Users\task_1538024168\build\tests\talos\talos\results.py", line 353, in parse
05:03:08     INFO -      % self.report_tokens)
05:03:08     INFO -    File "C:\Users\task_1538024168\build\tests\talos\talos\results.py", line 337, in error
05:03:08     INFO -      raise utils.TalosError(message)
05:03:08     INFO -  TalosError: Could not find report in browser output: [('tsformat', ('__start_report', '__end_report')), ('tpformat', ('__start_tp_report', '__end_tp_report'))]
05:03:08     INFO -  TEST-INFO took 40706ms
05:03:08     INFO -  SUITE-END | took 156s
05:03:48     INFO -  PID 6456 | [Parent 1300, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
05:03:48     INFO -  PID 6456 | [Child 6624, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
05:03:48     INFO -  PID 6456 | [Child 6624, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
05:06:03     INFO -  WARNING | IO Completion Port failed to signal process shutdown
05:06:03     INFO -  Parent process 6456 exited with children alive:
05:06:03     INFO -  PIDS: 9188, 6664, 7636, 7432, 1300, 6708
05:06:03     INFO -  Attempting to kill them, but no guarantee of success
05:06:03    ERROR - Return code: 2
05:06:03  WARNING - setting return code to 2
05:06:03    ERROR - # TBPL FAILURE #
05:06:03     INFO - Running post-action listener: _package_coverage_data
05:06:03     INFO - Running post-action listener: _resource_record_post_action
05:06:03     INFO - Running post-action listener: process_java_coverage_data
05:06:03     INFO - [mozharness: 2018-09-27 05:06:03.139000Z] Finished run-tests step (success)
05:06:03     INFO - Running post-run listener: _resource_record_post_run
05:06:03     INFO - Total resource usage - Wall time: 332s; CPU: 5.0%; Read bytes: 52033536; Write bytes: 237014528; Read time: 1; Write time: 6
05:06:03     INFO - TinderboxPrint: CPU usage<br/>4.7%
05:06:03     INFO - TinderboxPrint: I/O read bytes / time<br/>52,033,536 / 1
05:06:03     INFO - TinderboxPrint: I/O write bytes / time<br/>237,014,528 / 6
05:06:03     INFO - TinderboxPrint: CPU idle<br/>2,532.9 (95.3%)
05:06:03     INFO - TinderboxPrint: CPU user<br/>101.8 (3.8%)
05:06:03     INFO - install - Wall time: 2s; CPU: 13.0%; Read bytes: 4096; Write bytes: 4440064; Read time: 0; Write time: 0
05:06:03     INFO - setup-mitmproxy - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
05:06:03     INFO - run-tests - Wall time: 332s; CPU: 5.0%; Read bytes: 48605184; Write bytes: 213903872; Read time: 1; Write time: 5
05:06:03     INFO - Running post-run listener: copy_logs_to_upload_dir
05:06:03     INFO - Copying logs to upload dir...
05:06:03     INFO - mkdir: C:\Users\task_1538024168\build\upload\logs
05:06:03     INFO - Copying logs to upload dir...
05:06:03     INFO - Using _rmtree_windows ...
05:06:03     INFO - Running command: del /F /Q "C:\Users\task_1538024168\build\upload\logs\localconfig.json"
05:06:03     INFO - Return code: 0
05:06:03     INFO - Using _rmtree_windows ...
05:06:03     INFO - Running command: del /F /Q "C:\Users\task_1538024168\build\upload\logs\talos_info.log"
05:06:03     INFO - Return code: 0
05:06:03     INFO - Using _rmtree_windows ...
05:06:03     INFO - Running command: del /F /Q "C:\Users\task_1538024168\build\upload\logs\talos_raw.log"
05:06:03     INFO - Return code: 0
05:06:03     INFO - Using _rmtree_windows ...
05:06:03     INFO - Running command: del /F /Q "C:\Users\task_1538024168\build\upload\logs\talos_warning.log"
05:06:03     INFO - Return code: 0
05:06:03     INFO - Using _rmtree_windows ...
05:06:03     INFO - Running command: del /F /Q "C:\Users\task_1538024168\build\upload\logs\talos_critical.log"
05:06:03     INFO - Return code: 0
05:06:03     INFO - Using _rmtree_windows ...
05:06:03     INFO - Running command: del /F /Q "C:\Users\task_1538024168\build\upload\logs\talos_error.log"
05:06:03     INFO - Return code: 0
05:06:03     INFO - Using _rmtree_windows ...
05:06:03     INFO - Running command: del /F /Q "C:\Users\task_1538024168\build\upload\logs\talos_fatal.log"
05:06:03     INFO - Return code: 0
05:06:03  WARNING - returning nonzero exit status 2
[taskcluster 2018-09-27T05:06:03.351Z]    Exit Code: 2
[taskcluster 2018-09-27T05:06:03.351Z]    User Time: 0s
[taskcluster 2018-09-27T05:06:03.351Z]  Kernel Time: 0s
[taskcluster 2018-09-27T05:06:03.351Z]    Wall Time: 6m32.3318403s
[taskcluster 2018-09-27T05:06:03.351Z]       Result: FAILED
From the first two lines of log output in comment 0, the browser is existing immediately on startup:
https://treeherder.mozilla.org/logviewer.html#?job_id=201859714&repo=mozilla-central&lineNumber=1630-1631

This is well before extensions are loaded, are you sure this is related to bug 1493867?
Flags: needinfo?(aswan)
aswan: This is the only bug that I had found related to |tresize| in yesterday's merge. Can you point to someone that could help? Thanks!
Flags: needinfo?(aswan)
Summary: Perma [tier2] windows7-32-msvc tresize | Could not find report in browser output: [('tsformat', ('__start_report', '__end_report')), ('tpformat', ('__start_tp_report', '__end_tp_report'))] → Perma windows7-32-msvc tresize/windows7 | Could not find report in browser output: [('tsformat', ('__start_report', '__end_report')), ('tpformat', ('__start_tp_report', '__end_tp_report'))]
This is a wild stab in the dark, but did this also coincide with bug 1488554 landing?
The fact that the browser is just exiting immediately on startup really makes it unlikely that bug 1493867 is related, and I'm not very adept at debugging on Windows...
Flags: needinfo?(aswan)
Firefox is being started with -wait-for-browser, which is exactly what we want here...
Also note that this is actually intermittent.  In this test for instance:
https://treeherder.mozilla.org/logviewer.html#?job_id=202105647&repo=mozilla-inbound&lineNumber=2457

there are 13 successful cycles but the harness wants to run 20 cycles and Firefox exits immediately during the 14th cycle.  Since it is exiting silently without printing anything useful, ideally somebody familiar with Windows could recreate this and diagnose it.  I'm not sure who that would be...  Joel, do you have a go-to person for debugging on Windows?
Flags: needinfo?(jmaher)
we have issues on windows for unittests, right now I end up pushing towards :jimm, but he has a long backlog.
Flags: needinfo?(jmaher)
See Also: → 1495164
See Also: → 1220362
Unless I'm misunderstanding how talos tests are run, tresize is inherently racy. It attempts to load "resource://talos-powers/TalosPowersContent.js", but this is only available after the add-on has been installed and started, and I see nothing that guarantees this test runs after that has happened (again, unless I just can't tell how/when talos testcases load).
tresize-test.html attempts to load
"resource://talos-powers/TalosPowersContent.js", but this might not be available
yet if the tresize add-on hasn't loaded. This patch changes the location of this
resource to a relative path, which should always be available. Additionally,
this patch awaits on TalosPowers.loadPromise before using talos APIs.
This appears to work: https://treeherder.mozilla.org/#/jobs?repo=try&revision=12865653aaa26600f01fec673aca1fc905cb6321 (this is attaachment 9013363 on top of the patch from bug 1493427, which apparently made tresize essentially perma-fail before this change).
Assignee: nobody → dkeeler
Priority: P5 → P1
Comment on attachment 9013363 [details]
bug 1494573 - fix tresize flakiness by not having it load resources provided by the tresize add-on r?jmaher

Joel Maher ( :jmaher ) (UTC-4) has approved the revision.
Attachment #9013363 - Flags: review+
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fac0c84719b7
fix tresize flakiness by not having it load resources provided by the tresize add-on r=jmaher
https://hg.mozilla.org/mozilla-central/rev/fac0c84719b7
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
This updated our tresize baseline:

== Change summary for alert #16333 (as of Mon, 01 Oct 2018 16:50:43 GMT) ==

Improvements:

  3%  tresize linux64-qr opt e10s stylo     13.08 -> 12.67

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16333
You need to log in before you can comment on or make changes to this bug.