Windows PGO builds don't readahead and don't reorder omni.ja

RESOLVED FIXED in Firefox 67

Status

defect
RESOLVED FIXED
4 months ago
3 months ago

People

(Reporter: glandium, Assigned: glandium)

Tracking

unspecified
mozilla67
Dependency tree / graph

Firefox Tracking Flags

(firefox67 fixed)

Details

Attachments

(2 attachments)

Assignee

Description

4 months ago
No description provided.
Assignee

Comment 1

4 months ago

Corollary is that the builds should go red when they don't have this info.

Assignee

Updated

4 months ago
Summary: PGO builds don't have readahead information in omni.ja → Windows PGO builds don't have readahead information in omni.ja
Assignee

Comment 2

4 months ago

Looks like Linux builds are not affected.

Assignee

Comment 3

4 months ago

This is weird. PGO builds on try, integrations branches and even central don't have it. Neither do shipped betas nor releases. But nightlies do.

Assignee

Comment 4

4 months ago

55.0.3 is the last release with the readahead info. 56.0 is broken. 56.0b1 is broken too.

Assignee

Comment 5

4 months ago

So, it turns out this was broken on nightly too... until bug 1528900. The side effect is that releases and beta will be fixed by it riding trains. It's still broken on normal PGO builds, and we should unbreak those.

The core problem is that since bug 1316329, builds that "need" sccache (for some definition of need) build in z:\build instead of z:\task_random_number, where z:\build is a symbolic link to z:\task_random_number, and since bug 1269355 windows builds are marked as "need"ing sccache one way or another. Both bugs I mentioned here are well before 56, so what the hell? you may ask. Well, back then, we weren't shipping from taskcluster, we were still shipping from buildbot, so they were unaffected.

Bug 1357112 and bug 1381577 created the windows nightly tasks on taskcluster, and we only started shipping from those later.

Bug 1383257 might be what flipped the switch, and, retrospectively, bug 1383257 comment 97 should have raised eyebrows:

17% tp5n main_startup_fileio windows7-32 pgo e10s 82,145,881.76 -> 68,408,997.33

Yup, that's exactly what happens when readahead (and jar reordering) is lost.

Back to why this is broken, the jarlog that is used to optimize omni.ja contains paths in z:\task_random_number, while the build happens in z:\build, and the packager works in z:\build too, so it doesn't map the omni.ja path it knows to entries in the jarlog.

Summary: Windows PGO builds don't have readahead information in omni.ja → Windows PGO builds don't readahead and don't reorder omni.ja
Assignee

Updated

4 months ago
Depends on: 1530600

Ugh, that sucks. You probably need the equivalent of the fix I made in the symbol dumping code for mapping srcdir paths:
https://hg.mozilla.org/mozilla-central/rev/e5a62d4ce95a

Assignee

Updated

4 months ago
Depends on: 1529558, 1529551
Assignee

Updated

4 months ago
Depends on: 1531634
Assignee

Updated

4 months ago
Blocks: android-pgo
Assignee

Updated

4 months ago
Blocks: 1530600
No longer depends on: 1530600
Assignee

Comment 7

4 months ago

Instead of checking the MOZ_JAR_LOG_FILE for each log entry, only check
it once, and only check whether to log once per archive rather than once
per item.

Assignee

Comment 8

4 months ago

The jar log is used for optimization of the packaged jar files according
to their usage patterns during a profile run. The current content of the
file currently come with 2 caveats:

  • it contains entries for jar archives that aren't relevant to
    packaging, which is not a problem in itself, but see below.
  • it contains full paths for jar archives that may not correspond to the
    location of the packaged directory (on e.g. Android, where the build
    almost certainly doesn't happen in the same directory on the host as
    Fennec runs in the emulator/on the device).

The current JarLog code does somehow handle the various ways paths are
currently presented, but it's clearly missing code to map the paths in
the log to packaged paths. Instead of requiring manual work and extra
build options to handle this mapping, and considering the caveats above,
it's just simpler to log archive paths as if they were relative to the
packaged application directory in a build, and use that during
packaging.

Depends on D21655

Comment 9

4 months ago
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/e93f4871731b
Change zip logging initialization. r=aklotz
https://hg.mozilla.org/integration/autoland/rev/3c37fd9d30d5
Change jar log content. r=aklotz,chmanchester

Comment 10

4 months ago
Pushed by rgurzau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e2d7b59776a2
fix flake failure on a CLOSED TREE.

Comment 11

4 months ago
Pushed by rgurzau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d31f88692394
Fix more flake failures on a CLOSED TREE.

Backed out 4 changesets (bug 1529894) for xperf failures on a CLOSED TREE.

Backout link: https://hg.mozilla.org/integration/autoland/rev/24e79b0a187fdff462d085bc50b1adcc269e3509

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=3c37fd9d30d50b66b9b899c3f85758079f8e31d1&selectedJob=232116707

Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=232116707&repo=autoland&lineNumber=1423

Log snippet:

10:43:03 INFO - PID 3488 | [Child 2148, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
10:43:03 INFO - PID 3488 | [Child 2148, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
10:43:03 INFO - PID 3488 | [Parent 4376, Gecko_IOThread] WARNING: file z:/build/build/src/ipc/chromium/src/base/process_util_win.cc, line 160
10:43:05 INFO - TEST-INFO | 3488: exit 0
10:43:06 INFO - The trace you have just captured "Z:\task_1551866016\build\test.etl.kernel" may contain personally identifiable information, including but not necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that were logged. Please be aware of this when sharing out this trace with other people.
10:43:06 INFO - The trace you have just captured "Z:\task_1551866016\build\test.etl.user" may contain personally identifiable information, including but not necessarily limited to paths to files accessed, paths to registry accessed and process names. Exact information depends on the events that were logged. Please be aware of this when sharing out this trace with other people.
10:43:15 INFO - Merged Etl: test.etl
10:45:05 INFO - c:/Program Files/Microsoft Windows Performance Toolkit/xperf: warning: applying restriction of access for trace processing
10:46:01 INFO - reading etl filename: test.etl
10:46:01 INFO - etlparser: in readfile: test.etl.csv
10:46:01 INFO - TEST-UNEXPECTED-FAIL : xperf: File '{firefox}\browser\features\formautofill@mozilla.org.xpi' (normalized from 'Z:\task_1551866016\build\application\firefox\browser\features\formautofill@mozilla.org.xpi') was accessed and we were not expecting it. DiskReadCount: 6, DiskWriteCount: 0, DiskReadBytes: 393216, DiskWriteBytes: 0
10:46:05 INFO - extending with xperf!
10:46:05 INFO - Detected a regression for tp5n
10:46:05 INFO - TEST-UNEXPECTED-FAIL | tp5n | Talos has found a regression, if you have questions ask for help in irc on #perf
10:46:05 ERROR - Traceback (most recent call last):
10:46:05 INFO - File "Z:\task_1551866016\build\tests\talos\talos\run_tests.py", line 300, in run_tests
10:46:05 INFO - talos_results.add(mytest.runTest(browser_config, test))
10:46:05 INFO - File "Z:\task_1551866016\build\tests\talos\talos\ttest.py", line 64, in runTest
10:46:05 INFO - return self._runTest(browser_config, test_config, setup)
10:46:05 INFO - File "Z:\task_1551866016\build\tests\talos\talos\ttest.py", line 268, in _runTest
10:46:05 INFO - 'Talos has found a regression, if you have questions'
10:46:05 INFO - TalosRegression: Talos has found a regression, if you have questions ask for help in irc on #perf
10:46:05 INFO - TEST-INFO took 285926ms
10:46:05 INFO - SUITE-END | took 285s
10:46:06 ERROR - Return code: 1
10:46:06 WARNING - setting return code to 1
10:46:06 ERROR - # TBPL WARNING #
10:46:06 INFO - Running post-action listener: _package_coverage_data
10:46:06 INFO - Running post-action listener: _resource_record_post_action
10:46:06 INFO - Running post-action listener: process_java_coverage_data
10:46:06 INFO - [mozharness: 2019-03-06 10:46:06.245000Z] Finished run-tests step (success)

Flags: needinfo?(mh+mozilla)

Comment 13

4 months ago
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/e19e5857c7a8
Change zip logging initialization. r=aklotz
https://hg.mozilla.org/integration/autoland/rev/346002ff619e
Change jar log content. r=aklotz,chmanchester
Assignee

Updated

4 months ago
Flags: needinfo?(mh+mozilla)

Comment 14

4 months ago
bugherder
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Assignee

Updated

4 months ago
Blocks: 1534381
You need to log in before you can comment on or make changes to this bug.