Closed Bug 1445922 Opened 7 years ago Closed 7 years ago

Intermittent z:\build\build\src\third_party\aom\aom_dsp\simd\v64_intrinsics_c.h(754) : fatal error C1002: compiler is out of heap space in pass 2

Categories

(Core :: Audio/Video: Playback, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: away)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:product])

Attachments

(1 file)

From bug 1312238 comment 38 > 12:22:13 INFO - > z:\build\build\src\third_party\aom\aom_dsp\simd\v64_intrinsics_c.h(719) : > fatal error C1002: compiler is out of heap space in pass 2 > > Since this is in AOM code, I suspect it is due to the large function sizes > seen in bug 1412889. It is fixed upstream but our efforts to update (bug > 1445683) have hit roadblocks. > > If this is blocking you, we can probably just disable PGO in the affected > code (AOM won't be hit in a profile anyway).
Depends on: 1445683
There are 30 failures in the past 7 days, all occurrences happened on windows2012-32 pgo. Recent log failure: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=171998012&lineNumber=37516 Relevant part of the log: 02:04:15 INFO - z:\build\build\src\third_party\aom\aom_dsp\simd\v64_intrinsics_c.h(754) : fatal error C1002: compiler is out of heap space in pass 2 02:04:15 INFO - z:\build\build\src\third_party\aom\aom_dsp\simd\v256_intrinsics_c.h(101) : fatal error C1002: compiler is out of heap space in pass 2 02:04:15 INFO - z:\build\build\src\third_party\aom\aom_dsp\simd\v64_intrinsics_c.h(719) : fatal error C1002: compiler is out of heap space in pass 2 02:04:15 INFO - z:\build\build\src\third_party\aom\av1\common\cdef_block_simd.h(252) : fatal error C1002: compiler is out of heap space in pass 2 02:04:15 INFO - z:\build\build\src\third_party\aom\aom_dsp\simd\v64_intrinsics_c.h(271) : fatal error C1002: compiler is out of heap space in pass 2 02:04:15 INFO - LINK : fatal error LNK1257: code generation failed 02:04:15 INFO - z:\build\build\src\third_party\aom\aom_dsp\simd\v128_intrinsics_c.h(83) : fatal error C1002: compiler is out of heap space in pass 2 02:04:15 INFO - z:/build/build/src/config/rules.mk:679: recipe for target 'xul.dll' failed 02:04:15 INFO - mozmake.EXE[5]: *** [xul.dll] Error 1257 02:04:15 INFO - mozmake.EXE[5]: Leaving directory 'z:/build/build/src/obj-firefox/toolkit/library' 02:04:15 INFO - z:/build/build/src/config/recurse.mk:73: recipe for target 'toolkit/library/target' failed 02:04:15 INFO - mozmake.EXE[4]: *** [toolkit/library/target] Error 2 02:04:15 INFO - z:/build/build/src/config/recurse.mk:32: recipe for target 'compile' failed 02:04:15 INFO - mozmake.EXE[3]: *** [compile] Error 2 02:04:15 INFO - z:/build/build/src/config/rules.mk:418: recipe for target 'default' failed 02:04:15 INFO - mozmake.EXE[2]: *** [default] Error 2 02:04:15 INFO - Makefile:237: recipe for target 'profiledbuild' failed 02:04:15 INFO - mozmake.EXE[1]: *** [profiledbuild] Error 2 02:04:15 INFO - client.mk:168: recipe for target 'build' failed 02:04:15 INFO - mozmake.EXE: *** [build] Error 2 02:04:15 INFO - 125 compiler warnings present. 02:04:15 ERROR - Return code: 2 02:04:15 WARNING - setting return code to 2 02:04:15 FATAL - 'mach build' did not run successfully. Please check log for errors. 02:04:15 FATAL - Running post_fatal callback... 02:04:15 FATAL - Exiting -1 02:04:15 INFO - [mozharness: 2018-04-05 02:04:15.668000Z] Finished build step (failed) 02:04:15 INFO - Running post-run listener: _summarize 02:04:15 INFO - [mozharness: 2018-04-05 02:04:15.668000Z] FxDesktopBuild summary: 02:04:15 INFO - Running post-run listener: copy_logs_to_upload_dir 02:04:15 INFO - Copying logs to upload dir... 02:04:15 INFO - mkdir: z:\build\build\upload\logs [taskcluster:error] Exit Code: 4294967295 [taskcluster:error] User Time: 0s [taskcluster:error] Kernel Time: 15.625ms [taskcluster:error] Wall Time: 1h4m35.327078s [taskcluster:error] Result: FAILED [taskcluster 2018-04-05T02:04:15.883Z] === Task Finished === [taskcluster 2018-04-05T02:04:15.883Z] Task Duration: 1h11m23.565004s [taskcluster:error] Uploading error artifact public/build from file public/build with message "Could not read directory 'Z:\\task_1522887787\\public\\build'", reason "file-missing-on-worker" and expiry 2019-04-05T00:51:41.733Z [taskcluster:error] TASK FAILURE during artifact upload: file-missing-on-worker: Could not read directory 'Z:\task_1522887787\public\build' [taskcluster 2018-04-05T02:04:16.735Z] Uploading artifact public/logs/certified.log from file generic-worker\certified.log with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2019-04-05T00:51:41.733Z [taskcluster 2018-04-05T02:04:18.260Z] Uploading artifact public/chainOfTrust.json.asc from file generic-worker\chainOfTrust.json.asc with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2019-04-05T00:51:41.733Z [taskcluster 2018-04-05T02:04:18.977Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/GnRhnrFCSQ6NrgY2l5eB8w/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2019-04-05T00:51:41.733Z [taskcluster:error] Task not successful due to following exception(s): [taskcluster:error] Exception 1) [taskcluster:error] exit status 4294967295 [taskcluster:error] Exception 2) [taskcluster:error] file-missing-on-worker: Could not read directory 'Z:\task_1522887787\public\build' [taskcluster:error] :drno Can you please take a look here?
Flags: needinfo?(drno)
Whiteboard: [stockwell needswork]
(In reply to David Major [:dmajor] from comment #4) > Alternatively we could cherry pick > https://aomedia-review.googlesource.com/c/aom/+/39401 > > and https://chromium-review.googlesource.com/c/webm/libvpx/+/841103 for good measure Drno, what do you think?
(In reply to David Major [:dmajor] from comment #8) > (In reply to David Major [:dmajor] from comment #4) > > Alternatively we could cherry pick > > https://aomedia-review.googlesource.com/c/aom/+/39401 > > > > and https://chromium-review.googlesource.com/c/webm/libvpx/+/841103 for good measure > > Drno, what do you think? So the alternatives are: - either disable PGO on win32 for this code - wait for the libaom update to land in bug 1445683 - or cherry pick build fixes from upstream Since bug 1445683 might still be a little bit out I guess either disable PGO or cherry pick. I don't have a preference for either. David feel free to chose either one.
Flags: needinfo?(drno)
I tried cherry-picking those two fixes and my try push still hit this intermittent. Code outside of aom is failing too: 15:24:55 INFO - z:\build\build\src\js\src\vm\interpreter.cpp(4339) : fatal error C1002: compiler is out of heap space in pass 2 I'm starting to think this is less about particular functions and more about xul.dll simply growing larger by the day. Can we add more memory to the win32 pgo builders?
Flags: needinfo?(catlee)
See Also: → 1453061
(In reply to David Major [:dmajor] from comment #11) > Can we add more memory to the win32 pgo builders? If what I looked up was up-to-date, we're building Windows on c4.4xlarge AWS instances, so the next step up is c4.8xlarge, at slightly more than twice the price because screw you. I don't know how much we're spending on Windows builds, but every number I've ever heard about our AWS spend has turned another chunk of my hair white, so I doubt that would sound like a good investment. OTOH, because my sheriff colleagues are madmen and madwomen, very often the response to hitting this intermittent is to retrigger 5 builds, thus triggering 5 sets of tests, so it might be a close calculation to determine which would be cheaper, based on how frequently this hits, how frequently we do that, how much we could reduce the frequency of over-retriggering, and how much it would cost us in merged-around bustage to say "please stop retriggering PGO builds more than once" and wind up getting fewer retriggers of permaorange tests as a result. Or, you know, we could just disable PGO for code that won't wind up being optimized anyway, and wind up spending *less* and failing less.
As a little measure of frequency, of the last 12 Win32 PGO builds to finish on mozilla-inbound, 11 failed this way in AOM code, and 1 hit an infra failure.
Oh, nevermind that frequency measure, that's just because (In reply to David Major [:dmajor] from comment #11) > I tried cherry-picking those two fixes and my try push still hit this > intermittent. Actually, they make this very nearly permanent.
(In reply to Phil Ringnalda (:philor) from comment #13) Ok, ok -- I wasn't aware of the configuration and pricing situation.
Flags: needinfo?(catlee)
Assignee: nobody → dmajor
Attachment #8967090 - Flags: review?(core-build-config-reviews)
Attachment #8967090 - Flags: review?(core-build-config-reviews) → review+
Keywords: checkin-needed
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c4e17dd68065 Disable PGO for Win32 libaom due to compiler OOMs. r=froydnj
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Whiteboard: [stockwell needswork] → [stockwell fixed:product]
This improved build times on Windows 7. == Change summary for alert #12663 (as of Wed, 11 Apr 2018 21:37:14 GMT) == Improvements: 11% build times windows2012-32 pgo taskcluster-c4.4xlarge 4,682.35 -> 4,173.92 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12663
See Also: → 1456500
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: