Closed
Bug 1445922
Opened 7 years ago
Closed 7 years ago
Intermittent z:\build\build\src\third_party\aom\aom_dsp\simd\v64_intrinsics_c.h(754) : fatal error C1002: compiler is out of heap space in pass 2
Categories
(Core :: Audio/Video: Playback, defect, P5)
Core
Audio/Video: Playback
Tracking
()
RESOLVED
FIXED
mozilla61
Tracking | Status | |
---|---|---|
firefox61 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: away)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:product])
Attachments
(1 file)
1.25 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
Comment hidden (Intermittent Failures Robot) |
From bug 1312238 comment 38
> 12:22:13 INFO -
> z:\build\build\src\third_party\aom\aom_dsp\simd\v64_intrinsics_c.h(719) :
> fatal error C1002: compiler is out of heap space in pass 2
>
> Since this is in AOM code, I suspect it is due to the large function sizes
> seen in bug 1412889. It is fixed upstream but our efforts to update (bug
> 1445683) have hit roadblocks.
>
> If this is blocking you, we can probably just disable PGO in the affected
> code (AOM won't be hit in a profile anyway).
Oops, I mean bug 1412238 comment 38
Alternatively we could cherry pick https://aomedia-review.googlesource.com/c/aom/+/39401
and https://chromium-review.googlesource.com/c/webm/libvpx/+/841103 for good measure
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 7•7 years ago
|
||
There are 30 failures in the past 7 days, all occurrences happened on windows2012-32 pgo.
Recent log failure:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=171998012&lineNumber=37516
Relevant part of the log:
02:04:15 INFO - z:\build\build\src\third_party\aom\aom_dsp\simd\v64_intrinsics_c.h(754) : fatal error C1002: compiler is out of heap space in pass 2
02:04:15 INFO - z:\build\build\src\third_party\aom\aom_dsp\simd\v256_intrinsics_c.h(101) : fatal error C1002: compiler is out of heap space in pass 2
02:04:15 INFO - z:\build\build\src\third_party\aom\aom_dsp\simd\v64_intrinsics_c.h(719) : fatal error C1002: compiler is out of heap space in pass 2
02:04:15 INFO - z:\build\build\src\third_party\aom\av1\common\cdef_block_simd.h(252) : fatal error C1002: compiler is out of heap space in pass 2
02:04:15 INFO - z:\build\build\src\third_party\aom\aom_dsp\simd\v64_intrinsics_c.h(271) : fatal error C1002: compiler is out of heap space in pass 2
02:04:15 INFO - LINK : fatal error LNK1257: code generation failed
02:04:15 INFO - z:\build\build\src\third_party\aom\aom_dsp\simd\v128_intrinsics_c.h(83) : fatal error C1002: compiler is out of heap space in pass 2
02:04:15 INFO - z:/build/build/src/config/rules.mk:679: recipe for target 'xul.dll' failed
02:04:15 INFO - mozmake.EXE[5]: *** [xul.dll] Error 1257
02:04:15 INFO - mozmake.EXE[5]: Leaving directory 'z:/build/build/src/obj-firefox/toolkit/library'
02:04:15 INFO - z:/build/build/src/config/recurse.mk:73: recipe for target 'toolkit/library/target' failed
02:04:15 INFO - mozmake.EXE[4]: *** [toolkit/library/target] Error 2
02:04:15 INFO - z:/build/build/src/config/recurse.mk:32: recipe for target 'compile' failed
02:04:15 INFO - mozmake.EXE[3]: *** [compile] Error 2
02:04:15 INFO - z:/build/build/src/config/rules.mk:418: recipe for target 'default' failed
02:04:15 INFO - mozmake.EXE[2]: *** [default] Error 2
02:04:15 INFO - Makefile:237: recipe for target 'profiledbuild' failed
02:04:15 INFO - mozmake.EXE[1]: *** [profiledbuild] Error 2
02:04:15 INFO - client.mk:168: recipe for target 'build' failed
02:04:15 INFO - mozmake.EXE: *** [build] Error 2
02:04:15 INFO - 125 compiler warnings present.
02:04:15 ERROR - Return code: 2
02:04:15 WARNING - setting return code to 2
02:04:15 FATAL - 'mach build' did not run successfully. Please check log for errors.
02:04:15 FATAL - Running post_fatal callback...
02:04:15 FATAL - Exiting -1
02:04:15 INFO - [mozharness: 2018-04-05 02:04:15.668000Z] Finished build step (failed)
02:04:15 INFO - Running post-run listener: _summarize
02:04:15 INFO - [mozharness: 2018-04-05 02:04:15.668000Z] FxDesktopBuild summary:
02:04:15 INFO - Running post-run listener: copy_logs_to_upload_dir
02:04:15 INFO - Copying logs to upload dir...
02:04:15 INFO - mkdir: z:\build\build\upload\logs
[taskcluster:error] Exit Code: 4294967295
[taskcluster:error] User Time: 0s
[taskcluster:error] Kernel Time: 15.625ms
[taskcluster:error] Wall Time: 1h4m35.327078s
[taskcluster:error] Result: FAILED
[taskcluster 2018-04-05T02:04:15.883Z] === Task Finished ===
[taskcluster 2018-04-05T02:04:15.883Z] Task Duration: 1h11m23.565004s
[taskcluster:error] Uploading error artifact public/build from file public/build with message "Could not read directory 'Z:\\task_1522887787\\public\\build'", reason "file-missing-on-worker" and expiry 2019-04-05T00:51:41.733Z
[taskcluster:error] TASK FAILURE during artifact upload: file-missing-on-worker: Could not read directory 'Z:\task_1522887787\public\build'
[taskcluster 2018-04-05T02:04:16.735Z] Uploading artifact public/logs/certified.log from file generic-worker\certified.log with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2019-04-05T00:51:41.733Z
[taskcluster 2018-04-05T02:04:18.260Z] Uploading artifact public/chainOfTrust.json.asc from file generic-worker\chainOfTrust.json.asc with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2019-04-05T00:51:41.733Z
[taskcluster 2018-04-05T02:04:18.977Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/GnRhnrFCSQ6NrgY2l5eB8w/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2019-04-05T00:51:41.733Z
[taskcluster:error] Task not successful due to following exception(s):
[taskcluster:error] Exception 1)
[taskcluster:error] exit status 4294967295
[taskcluster:error] Exception 2)
[taskcluster:error] file-missing-on-worker: Could not read directory 'Z:\task_1522887787\public\build'
[taskcluster:error]
:drno Can you please take a look here?
Flags: needinfo?(drno)
Whiteboard: [stockwell needswork]
(In reply to David Major [:dmajor] from comment #4)
> Alternatively we could cherry pick
> https://aomedia-review.googlesource.com/c/aom/+/39401
>
> and https://chromium-review.googlesource.com/c/webm/libvpx/+/841103 for good measure
Drno, what do you think?
Comment hidden (Intermittent Failures Robot) |
Comment 10•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #8)
> (In reply to David Major [:dmajor] from comment #4)
> > Alternatively we could cherry pick
> > https://aomedia-review.googlesource.com/c/aom/+/39401
> >
> > and https://chromium-review.googlesource.com/c/webm/libvpx/+/841103 for good measure
>
> Drno, what do you think?
So the alternatives are:
- either disable PGO on win32 for this code
- wait for the libaom update to land in bug 1445683
- or cherry pick build fixes from upstream
Since bug 1445683 might still be a little bit out I guess either disable PGO or cherry pick. I don't have a preference for either. David feel free to chose either one.
Flags: needinfo?(drno)
![]() |
Assignee | |
Comment 11•7 years ago
|
||
I tried cherry-picking those two fixes and my try push still hit this intermittent.
Code outside of aom is failing too:
15:24:55 INFO - z:\build\build\src\js\src\vm\interpreter.cpp(4339) : fatal error C1002: compiler is out of heap space in pass 2
I'm starting to think this is less about particular functions and more about xul.dll simply growing larger by the day.
Can we add more memory to the win32 pgo builders?
Flags: needinfo?(catlee)
Comment hidden (Intermittent Failures Robot) |
Comment 13•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #11)
> Can we add more memory to the win32 pgo builders?
If what I looked up was up-to-date, we're building Windows on c4.4xlarge AWS instances, so the next step up is c4.8xlarge, at slightly more than twice the price because screw you. I don't know how much we're spending on Windows builds, but every number I've ever heard about our AWS spend has turned another chunk of my hair white, so I doubt that would sound like a good investment.
OTOH, because my sheriff colleagues are madmen and madwomen, very often the response to hitting this intermittent is to retrigger 5 builds, thus triggering 5 sets of tests, so it might be a close calculation to determine which would be cheaper, based on how frequently this hits, how frequently we do that, how much we could reduce the frequency of over-retriggering, and how much it would cost us in merged-around bustage to say "please stop retriggering PGO builds more than once" and wind up getting fewer retriggers of permaorange tests as a result.
Or, you know, we could just disable PGO for code that won't wind up being optimized anyway, and wind up spending *less* and failing less.
Comment 14•7 years ago
|
||
As a little measure of frequency, of the last 12 Win32 PGO builds to finish on mozilla-inbound, 11 failed this way in AOM code, and 1 hit an infra failure.
Comment 15•7 years ago
|
||
Oh, nevermind that frequency measure, that's just because
(In reply to David Major [:dmajor] from comment #11)
> I tried cherry-picking those two fixes and my try push still hit this
> intermittent.
Actually, they make this very nearly permanent.
![]() |
Assignee | |
Comment 16•7 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #13)
Ok, ok -- I wasn't aware of the configuration and pricing situation.
Flags: needinfo?(catlee)
![]() |
Assignee | |
Comment 17•7 years ago
|
||
Assignee: nobody → dmajor
Attachment #8967090 -
Flags: review?(core-build-config-reviews)
![]() |
||
Updated•7 years ago
|
Attachment #8967090 -
Flags: review?(core-build-config-reviews) → review+
Keywords: checkin-needed
Comment 18•7 years ago
|
||
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c4e17dd68065
Disable PGO for Win32 libaom due to compiler OOMs. r=froydnj
Keywords: checkin-needed
Comment 19•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Updated•7 years ago
|
Whiteboard: [stockwell needswork] → [stockwell fixed:product]
Comment hidden (Intermittent Failures Robot) |
Comment 21•7 years ago
|
||
This improved build times on Windows 7.
== Change summary for alert #12663 (as of Wed, 11 Apr 2018 21:37:14 GMT) ==
Improvements:
11% build times windows2012-32 pgo taskcluster-c4.4xlarge 4,682.35 -> 4,173.92
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12663
You need to log in
before you can comment on or make changes to this bug.
Description
•