Closed Bug 1381846 Opened 2 years ago Closed 2 years ago

Talos tpaint fails locally when using --geckoProfile

Categories

(Testing :: Talos, defect)

55 Branch
defect
Not set

Tracking

(firefox59 fixed)

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: igoldan, Assigned: rwood)

References

(Blocks 1 open bug)

Details

(Whiteboard: [PI:January])

Attachments

(1 file)

When running | mach talos-test -a tpaint --geckoProfile | on my local machine, I get this error, which crashes the perf test.

...
16:42:23     INFO -  starting webserver on 'localhost:54554'
16:42:23     INFO -  SUITE-START | Running 1 tests
16:42:23     INFO -  TEST-START | tpaint
16:42:23     INFO -  Initialising browser for tpaint test...
16:42:23     INFO -  Application command: e:\workspace\mozilla-source\mozilla-central\objdir-frontend\dist\bin\firefox --no-remote http://localhost:54554/getInfo.html -profile c:\users\ionut~1.gol\appdata\local\temp\tmpbostwg\profile
16:42:23     INFO -  TEST-INFO | started process 18852 (e:\workspace\mozilla-source\mozilla-central\objdir-frontend\dist\bin\firefox --no-remote http://localhost:54554/getInfo.html)
16:42:27     INFO -  TEST-INFO | 18852: exit 0
16:42:27     INFO -  Clearing archive e:/workspace/mozilla-source/mozilla-central\testing\mozharness\build\blobber_upload_dir\profile_tpaint.zip
16:42:27     INFO -  Activating Gecko Profiling. Temp. profile dir: c:\users\ionut~1.gol\appdata\local\temp\tmpyewdjg, interval: 1, entries: 2000000
16:42:27     INFO -  Browser initialized.
16:42:27     INFO -  Running cycle 1/1 for tpaint test...
16:42:27     INFO -  TEST-INFO | started process 14056 (e:\workspace\mozilla-source\mozilla-central\objdir-frontend\dist\bin\firefox --no-remote -profile c:\users\ionut~1.gol\appdata\local\temp\tmpbostwg\profile -tpprofilinginfo {"gecko_profile_entries": 2000000, "gecko_profile_threads": "GeckoMain,Compositor", "gecko_profile_dir": "c:\\users\\ionut~1.gol\\appdata\\local\\temp\\tmpyewdjg", "gecko_profile_interval": 1} -tp file:\e:\workspace\mozilla-source\mozilla-central\testing\talos\talos\tests\tpaint\tpaint.manifest.develop -tpchrome -tpmozafterpaint -tpnoisy -tpcycles 1 -tppagecycles 2)
16:42:39     INFO -  PID 14056 | RSS: Main: 192946176
16:42:39     INFO -  PID 14056 |
16:42:40     INFO -  PID 14056 | openingTime=209.1500000000001
16:42:40     INFO -  PID 14056 | Cycle 1(1): loaded http://localhost:54554/tests/tpaint/tpaint.html?auto=1 (next: http://localhost:54554/tests/tpaint/tpaint.html?auto=1)
16:42:41     INFO -  PID 14056 | RSS: Main: 198553600
16:42:41     INFO -  PID 14056 |
16:42:42     INFO -  PID 14056 | openingTime=206.70499999999993
16:42:42     INFO -  PID 14056 | Cycle 1(2): loaded http://localhost:54554/tests/tpaint/tpaint.html?auto=1 (next: http://localhost:54554/tests/tpaint/tpaint.html?auto=1)
16:43:26     INFO -  PID 14056 | Unable to read VR Path Registry from C:\Users\ionut.goldan\AppData\Local\openvr\openvrpaths.vrpath
16:52:27     INFO -  Timeout waiting for test completion; killing browser...
16:52:27     INFO -  mozcrash Saved minidump as e:/workspace/mozilla-source/mozilla-central\testing\mozharness\build\blobber_upload_dir\3c82d83e-e6d2-4d75-a05d-22a41296194a.dmp
16:52:28     INFO -  PROCESS-CRASH | tpaint | application crashed [unknown top frame]
16:52:28     INFO -  Crash dump filename: c:\users\ionut~1.gol\appdata\local\temp\tmpbostwg\profile\minidumps\3c82d83e-e6d2-4d75-a05d-22a41296194a.dmp
16:52:28     INFO -  MINIDUMP_STACKWALK not set, can't process dump.
16:52:28     INFO -  rmtree() failed for "('c:\\users\\ionut~1.gol\\appdata\\local\\temp\\tmpbostwg',)". Reason: The directory is not empty (41). Retrying...
16:52:28     INFO -  TEST-UNEXPECTED-ERROR | tpaint | Found crashes after test run, terminating test
16:52:28    ERROR -  Traceback (most recent call last):
16:52:28     INFO -    File "e:/workspace/mozilla-source/mozilla-central\testing\talos\talos\run_tests.py", line 268, in run_tests
16:52:28     INFO -      talos_results.add(mytest.runTest(browser_config, test))
16:52:28     INFO -    File "e:\workspace\mozilla-source\mozilla-central\testing\talos\talos\ttest.py", line 61, in runTest
16:52:28     INFO -      return self._runTest(browser_config, test_config, setup)
16:52:28     INFO -    File "e:\workspace\mozilla-source\mozilla-central\testing\talos\talos\ttest.py", line 197, in _runTest
16:52:28     INFO -      test_config['name'])
16:52:28     INFO -    File "e:\workspace\mozilla-source\mozilla-central\testing\talos\talos\ttest.py", line 45, in check_for_crashes
16:52:28     INFO -      raise TalosCrash("Found crashes after test run, terminating test")
16:52:28     INFO -  TalosCrash: Found crashes after test run, terminating test
16:52:28     INFO -  TEST-INFO took 604575ms
16:52:28     INFO -  SUITE-END | took 604s
16:52:28    ERROR - Return code: 2
16:52:28  WARNING - setting return code to 2
16:52:28    ERROR - # TBPL FAILURE #
16:52:28     INFO - Running post-action listener: _package_coverage_data
16:52:28     INFO - Running post-action listener: _resource_record_post_action
16:52:28     INFO - [mozharness: 2017-07-18 13:52:28.460000Z] Finished run-tests step (success)
16:52:28     INFO - Running post-run listener: _resource_record_post_run
16:52:28     INFO - Total resource usage - Wall time: 606s; CPU: 15.0%; Read bytes: 1366809600; Write bytes: 270309888; Read time: 1022080; Write time: 1416730
16:52:28     INFO - TinderboxPrint: CPU usage<br/>15.0%
16:52:28     INFO - TinderboxPrint: I/O read bytes / time<br/>1,366,809,600 / 1,022,080
16:52:28     INFO - TinderboxPrint: I/O write bytes / time<br/>270,309,888 / 1,416,730
16:52:28     INFO - TinderboxPrint: CPU idle<br/>2,070.1 (85.0%)
16:52:28     INFO - TinderboxPrint: CPU system<br/>92.0 (3.8%)
16:52:28     INFO - TinderboxPrint: CPU user<br/>263.7 (10.8%)
16:52:28     INFO - setup-mitmproxy - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
16:52:28     INFO - run-tests - Wall time: 607s; CPU: 15.0%; Read bytes: 1366809600; Write bytes: 270309888; Read time: 1022080; Write time: 1416730
16:52:28     INFO - Running post-run listener: _upload_blobber_files
16:52:28  WARNING - Blob upload gear skipped. Missing cmdline options.
...
:igoldan, is this still an issue? If not, please close out the bug, thanks!
Flags: needinfo?(igoldan)
Blocks: 1425103
Summary: Talos test fails locally when using --geckoProfile → Talos tpaint fails locally when using --geckoProfile
Whiteboard: [PI:January]
This is still an issue, also reported by Xidorn Quan
Flags: needinfo?(igoldan)
Assignee: nobody → rwood
Status: NEW → ASSIGNED
Hey Ionut, this is working fine for me locally on OSX with the latest inbound:

./mach talos-test --activeTests tpaint --geckoProfile

The test finishes fine, and the profile_tpaint.zip is created successfully in the local blobber_uplaod_dir.

Can you please pull the latest and try it? Looks to me like it has been fixed somehow... unless it is platform specific. Thanks!
Flags: needinfo?(igoldan)
Yes, will check this on Windows 10 and Ubuntu 16.04.
Flags: needinfo?(igoldan)
(In reply to Robert Wood [:rwood] from comment #3)
> ./mach talos-test --activeTests tpaint --geckoProfile

Do you remember how long this took you?
Flags: needinfo?(rwood)
I ran this on Ubuntu and it worked. It produced the same file you mentioned for tpaint.

On Windows 10 this still fails, with the same traceback. So you were right, this bug is platform specific.
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #6)
> I ran this on Ubuntu and it worked. It produced the same file you mentioned
> for tpaint.
> 
> On Windows 10 this still fails, with the same traceback. So you were right,
> this bug is platform specific.

Ok thanks Ionut! I'll look at this on Windows.
Flags: needinfo?(rwood)
Getting it narrowed down with some pageloader debugging:

14:36:13     INFO -  TEST-INFO | started process 3896 (c:\mozilla-source\mozilla-central\obj-i686-pc-mingw32\dist\bin\firefox --no-remote -profile c:\users\robert~1\appdata\local\temp\tmp20phhz\profile)
14:36:15     INFO -  PID 3896 | inside plLoadPage
14:36:15     INFO -  PID 3896 | about to call profiler beginTest
14:36:15     INFO -  PID 3896 | adding our chrome message listeners
14:36:15     INFO -  PID 3896 | about to call profiler.mark to mark opening the page
14:36:15     INFO -  PID 3896 | inside startAndLoadURI, pageName is:
14:36:15     INFO -  PID 3896 | http://localhost:50862/tests/tpaint/tpaint.html?auto=1
14:36:15     INFO -  PID 3896 | we just grabbed our start_time
14:36:15     INFO -  PID 3896 | about to call content.loadURI
14:36:15     INFO -  PID 3896 | inside tpaint startTest
14:36:15     INFO -  PID 3896 | inside tpaint scheduleNextWindow
14:36:16     INFO -  PID 3896 | inside ContentListener, received:
14:36:16     INFO -  PID 3896 | PageLoader:IdleCallbackSet
14:36:16     INFO -  PID 3896 | inside ContentListener, received:
14:36:16     INFO -  PID 3896 | PageLoader:IdleCallbackReceived
14:36:16     INFO -  PID 3896 | inside ContentListener, received:
14:36:16     INFO -  PID 3896 | PageLoader:LoadEvent
14:36:16     INFO -  PID 3896 | inside plLoadHandlerMessage
14:36:17     INFO -  PID 3896 | inside tpaint openWindow
14:36:17     INFO -  PID 3896 | inside tpaint childIsOpen
14:36:17     INFO -  PID 3896 | inside tpaint reportTimes
14:36:17     INFO -  PID 3896 | openingTime=269.53999999999996
14:36:17     INFO -  PID 3896 | inside ContentListener, received:
14:36:17     INFO -  PID 3896 | PageLoader:RecordTime
14:36:17     INFO -  PID 3896 | inside plRecordTime
14:36:17     INFO -  PID 3896 | Cycle 1(1): loaded http://localhost:50862/tests/tpaint/tpaint.html?auto=1 (next: http://localhost:50862/tests/tpaint/tpaint.html?auto=1)
14:36:17     INFO -  PID 3896 | *RW* inside plNextPage
14:36:17     INFO -  PID 3896 | about to call await TalosParentProfiler.finishTest
<hangs>
Getting closer! The profile is failing to be written to file:

16:12:59     INFO -  PID 4836 | Cycle 1(1): loaded http://localhost:51825/tests/tpaint/tpaint.html?auto=1 (next: http://localhost:51825/tests/tpaint/tpaint.html?auto=1)
16:12:59     INFO -  PID 4836 | inside talosParentProfiler.pause
16:12:59     INFO -  PID 4836 | inside TalosPowersService.profilerPause
16:13:00     INFO -  PID 4836 | *RW* inside plNextPage
16:13:00     INFO -  PID 4836 | about to call await TalosParentProfiler.finishTest
16:13:00     INFO -  PID 4836 | inside talosParentProfiler.finishTest
16:13:00     INFO -  PID 4836 | about to call TalosPowers.profilerFinish
16:13:00     INFO -  PID 4836 | inside TalosPowersService.profilerFinish
16:13:00     INFO -  PID 4836 | about to call OS.File.writeAtomic with:
16:13:00     INFO -  PID 4836 | profileFile: c:\users\robert~1\appdata\local\temp\tmpqoi8nr/tpaint.html?auto=1_pagecycle_1.profile
16:13:00     INFO -  PID 4836 | array length: 1014340
<hangs here for 10 min then pageloader times out>

Same debugging but with a cycle of perf-reftest-singletons-e10s works fine (same machine):

16:10:15     INFO -  PID 6844 | Cycle 1(1): loaded http://localhost:51683/tests/perf-reftest-singletons/bidi-resolution-1.html (next: http://localhost:51683/tests/perf-reftest-singletons/bloom-basic.html)
16:10:15     INFO -  PID 6844 | inside talosParentProfiler.pause
16:10:15     INFO -  PID 6844 | inside TalosPowersService.profilerPause
16:10:15     INFO -  PID 6844 | inside ContentListener, received:
16:10:15     INFO -  PID 6844 | PageLoader:LoadEvent
16:10:15     INFO -  PID 6844 | inside plLoadHandlerMessage
16:10:15     INFO -  PID 6844 | inside ContentListener, received:
16:10:15     INFO -  PID 6844 | PageLoader:IdleCallbackSet
16:10:15     INFO -  PID 6844 | inside ContentListener, received:
16:10:15     INFO -  PID 6844 | PageLoader:IdleCallbackReceived
16:10:15     INFO -  PID 6844 | *RW* inside plNextPage
16:10:15     INFO -  PID 6844 | about to call await TalosParentProfiler.finishTest
16:10:15     INFO -  PID 6844 | inside talosParentProfiler.finishTest
16:10:15     INFO -  PID 6844 | about to call TalosPowers.profilerFinish
16:10:15     INFO -  PID 6844 | inside TalosPowersService.profilerFinish
16:10:27     INFO -  PID 6844 | about to call OS.File.writeAtomic with:
16:10:27     INFO -  PID 6844 | profileFile: c:\users\robert~1\appdata\local\temp\tmp9d66x6/bidi-resolution-1.html_pagecycle_1.profile
16:10:27     INFO -  PID 6844 | array length: 120391048
16:10:27     INFO -  PID 6844 | about to call Services.profiler.StopProfiler
16:10:27     INFO -  PID 6844 | back from stopProfiler, resolving promise
16:10:27     INFO -  PID 6844 | back from profiler finishTest
16:10:27     INFO -  PID 6844 | doNextPage is true
16:10:27     INFO -  PID 6844 | about to force GC
16:10:28     INFO -  PID 6844 | about to settimeout to call plLoadPage
16:10:28     INFO -  PID 6844 | inside plLoadPage
16:10:28     INFO -  PID 6844 | about to call profiler beginTest
16:10:28     INFO -  PID 6844 | inside talosParentProfiler.beginTest
16:10:28     INFO -  PID 6844 | about to call TalosPowers.profilerBegin
16:10:28     INFO -  PID 6844 | inside TalosPowersService.profilerBegin, about to call StartProfiler
16:10:28     INFO -  PID 6844 | about to call PauseSamplingadding our chrome message listeners
16:10:28     INFO -  PID 6844 | about to call profiler.mark to mark opening the page
16:10:28     INFO -  PID 6844 | inside talosParentProfiler.mark
16:10:28     INFO -  PID 6844 | inside TalosPowersService.profilerMarker
16:10:28     INFO -  PID 6844 | inside startAndLoadURI, pageName is:
16:10:28     INFO -  PID 6844 | http://localhost:51683/tests/perf-reftest-singletons/bidi-resolution-1.html
16:10:28     INFO -  PID 6844 | we just grabbed our start_time
16:10:28     INFO -  PID 6844 | about to call content.loadURI
16:10:29     INFO -  PID 6844 | inside ContentListener, received:
16:10:29     INFO -  PID 6844 | PageLoader:RecordTime
16:10:29     INFO -  PID 6844 | inside plRecordTime
16:10:29     INFO -  PID 6844 | Cycle 1(2): loaded http://localhost:51683/tests/perf-reftest-singletons/bidi-resolution-1.html (next: http://localhost:51683/tests/perf-reftest-singletons/bloom-basic.html)

Ahhh...

I'm guessing the profileFile name is invalid on Windows (but valid on linux/OSX) for tpaint: profileFile: c:\users\robert~1\appdata\local\temp\tmpqoi8nr/tpaint.html?auto=1_pagecycle_1.profile
Comment on attachment 8941223 [details]
Bug 1381846 - Fix gecko profiling with talos tpaint on windows;

https://reviewboard.mozilla.org/r/211488/#review217302

great find
Attachment #8941223 - Flags: review?(jmaher) → review+
Oh sweet this also fixes profiling with talos-chrome on Windows
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/731b1321ba95
Fix gecko profiling with talos tpaint on windows; r=jmaher
https://hg.mozilla.org/mozilla-central/rev/731b1321ba95
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
You need to log in before you can comment on or make changes to this bug.