Closed Bug 1354273 Opened 3 years ago Closed 3 years ago
.js | Assertion failure: [unhandlable oom] Failed to allocate slots while tenuring ., at z:/build/build/src/js/src/jscntxt .cpp:1504 (code -2147483645, args "--no-baseline --no-ion")
These failures begin with the landing of bug 1333003: https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=windows%20vm%20debug%20tc%28jit%29&fromchange=7632e3ba826252942ead773e3f70bcec0168f88e&tochange=060bca004d791980ebe510654efb0cab0760f1dc
And with fail, I mean permafail on Windows 7 VM debug tc(Jit).
Similar to bug 1354306, there's a patch https://hg.mozilla.org/mozilla-central/rev/9ab7778f16d9 in bug 1333003 to fix Jit test which did *not* run any test case on taskcluster. The patch just makes the tests to be run, which I don't think is the root cause of the intermittent failure.
In this bug and in bug 1354306, we now have frequent failures in jit tests on Windows 7 VM debug, but only on tc. These tests run for Windows 7 VM debug on buildbot without any sign of failure. Failures began with :ting's push because his changes started running those tests on tc(jit). We'll need to fix or disable the affected tests soon. :h4writer -- Can you find someone to investigate the failures? :jmaher -- Would you expect differences between Windows 7 VM tc and non-tc environments? Who should we loop in?
:grenade- we have an example of a specific tc only failure here- can you help provide information about the differences in the OS for buildbot VM vs taskcluster VM, likewise any operating system differences?
Flags: needinfo?(jmaher) → needinfo?(rthijssen)
Main differences are: - tc win 7 tests run under GenericWorker user account, which is dynamically created on instance first boot. if something needs to exist in the user profile (like registry settings or file-system objects) we must also create those at first boot. under buildbot, tests run under the cltbld account which is created by GPO on top of an MDT image where some profile objects were historically, manually created and undocumented (or documentation lost). its entirely possible we've omitted some aspect of the user profile on tc test instances. - buildbot test slaves use an older version of python (2.7.3, i believe). tc uses 2.7.11 we used to see problems caused by this difference, but i thought most had been resolved (or maybe bb instances were upgraded, i'm out of the loop there) - pmoore did a lot of work to understand and fix problems around python subprocess creation, which was problematic on tc due to our use of go code to parent the build processes. i know very little about how that all works, but it used to be a big source of pain in test output differences
:naveed, can you help find someone to look at this bug? between this bug and bug 1354076, we have 54 failures in the last week.
(In reply to Rob Thijssen (:grenade - CEST) from comment #8) This looks like OOM. Are the memory configuration differences between builtbot and taskcluster? For reference this test takes ~520MiB max RSS on linux64 (I don't have access to Windows to test on right now).
odd that we get OOM only in taskcluster, maybe there are settings at the OS level or user shell that cause this.
I looked at why this test was taking so much memory. Ironically enough it's an OOM test. When we do something like |oomTest(newGlobal)| we partially create over a thousand globals which takes up a lot of memory. This doesn't trip any of our GC triggers however. I think this is because they're based on things like total memory used, not number of compartments. We should investigate why this is, but for now let's just force a GC if our OOM testing creates too many compartments. This patch reduces the memory needed by the test to ~40MiB. My patches in bug 1352430 make this test permafail, but pass with this patch applied.
Assignee: nobody → jcoppeard
Attachment #8860322 - Flags: review?(sphink)
I filed bug 1358435 for the followup investigation.
win 7 instances are limited to 3gb ram because of the 32 bit os limitation. but this is the same between bb and tc. i'm not aware of any other memory configuration or difference.
Comment on attachment 8860322 [details] [diff] [review] bug1354273-oomtest-gc Review of attachment 8860322 [details] [diff] [review]: ----------------------------------------------------------------- Seems fine as a stopgap, though it also makes me wonder if we *should* persistently account for some things like this and use them as additional GC triggers. I wouldn't expect this to happen much at all with normal sites, but could happen with buggy JS that creates too many iframes or something, and could easily be a malicious denial of service attack. But that last is not all that convincing; I'm sure there are plenty of ways to DOS the web platform.
Attachment #8860322 - Flags: review?(sphink) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/6f40efd60fcb GC during OOMTest if we create too many compartments r=sfink
Whiteboard: [stockwell needswork] → [stockwell fixed]
You need to log in before you can comment on or make changes to this bug.