Closed
Bug 1262015
Opened 8 years ago
Closed 7 years ago
Intermittent browser_wa_reset-01.js | Found a tab after previous test timed out: doc_simple-context.html - | application crashed [@ js::gc::ZoneCellIterImpl::ZoneCellIterImpl(JS::Zone *,js::gc::AllocKind)]
Categories
(Core :: JavaScript: GC, defect, P3)
Core
JavaScript: GC
Tracking
()
People
(Reporter: KWierso, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure)
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Component: Developer Tools: Web Audio Editor → JavaScript: GC
Product: Firefox → Core
Comment 2•8 years ago
|
||
Paul, I pinged Terrence about this on IRC and he said there's probably an underlying tracing/rooting issue in this code. Any chance you can take a look?
Flags: needinfo?(padenot)
Comment 3•8 years ago
|
||
This might just be another shutdown issue. I put a possible fix for all MediaStreamGraph-related issues in bug 1267600.
Flags: needinfo?(padenot)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 10•8 years ago
|
||
This appears to still be hitting with high frequency. Any chance you could take another look, Paul?
Flags: needinfo?(padenot)
Comment 11•8 years ago
|
||
I can, but I might need some hints to debug this. Andrew, if this is a tracing/rooting issue, do we have a way to debug this? I suppose I could push some instrumentation on try and retrigger like crazy or something. I'm afraid I know close to nothing about all this.
Flags: needinfo?(padenot) → needinfo?(continuation)
Comment 12•8 years ago
|
||
I see at least two different assertions here, isNurseryAllocAllowed https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=112939#L20448 and rt->gc.nursery.isEmpty() https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-aurora&job_id=2861093 The first was more common in the logs I looked at. This should be starred according to the assertion message, not that there was a crash. I'm not sure what these assertions mean, maybe Terrence could help. I did notice that these assertions seem to be happening shortly after a "WARNING: Audio Buffer is not full by the end of the callback." message, so maybe that's related.
Flags: needinfo?(continuation) → needinfo?(terrence)
Comment 13•8 years ago
|
||
This means that something tried to allocate a generic object while there was an AutoAssertNoNurseryAlloc on the stack. This happens frequently if someone tries to call a script or use spidermonkey api from a callback where such is not allowed. The same is true of the nursery.isEmpty() assertion. The latter can only happen if script usage occurs in a GC callback. This will be trivial to track down if we can find a clean crash stack. So far all the ones I've checked have been hopelessly corrupted: e.g. arena_dalloc cannot possibly call js::Interpret. I'll keep looking.
Flags: needinfo?(terrence)
Comment 14•8 years ago
|
||
Looking at 10's of stacks from the most recent orangefactor report shows that these are: * Only on M-e10s(dt7) * Only in debug builds * On all versions of windows (although mostly win7) * All have a very similar, but essentially broken stack trace I think this is mostly likely either a miscompilation or some sort of really, really nasty heap corruption. Looking at crashstats for tryNewNurseryGCThing, I see [1]. So this may be an issue we've released. Unfortunately, the stacks on those reports are even more broken. I'm afraid that this bug is going to require a dmajor level of debugging skill to investigate successfully. 1- https://crash-stats.mozilla.org/signature/?signature=js%3A%3Agc%3A%3AGCRuntime%3A%3AtryNewNurseryObject%3CT%3E&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#reports
Comment 15•8 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #14) > * On all versions of windows (although mostly win7) As mentioned on IRC, this is probably because WinXP/Win8 only have M-e10s enabled on Ash and on the release branches where volume is obviously much lower. WinXP and Win8 are run on in-house machines still, so that at least makes it seem unlikely to be an issue with AWS machine configs or something.
Comment 16•8 years ago
|
||
Sounds like the kind of situation the Uptime team might be interested in too.
Updated•8 years ago
|
Comment 17•8 years ago
|
||
Terrence, FWIW, bug 1240231 was hitting on OSX too, so I'm not sure this is a compiler issue unless it's something that manages to affect multiple different ones. But I'm also wondering if it's worth throwing rr-chaos at it at this point to see if we can hit it on Linux too under the right circumstances.
Updated•8 years ago
|
Blocks: e10s-tests
status-firefox47:
--- → wontfix
status-firefox48:
--- → affected
status-firefox49:
--- → affected
status-firefox50:
--- → affected
tracking-e10s:
--- → ?
Comment 18•8 years ago
|
||
I tracked bug 1237795 down to bug 1132501. Hopefully that helps shed some light on this.
Updated•8 years ago
|
Priority: -- → P3
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 22•8 years ago
|
||
Something made this stop on trunk around July 20. I wonder what!
Flags: needinfo?(terrence)
Comment 23•8 years ago
|
||
Nothing stands out. It probably wouldn't though if it was a heap corruption or undefined behavior.
Flags: needinfo?(terrence)
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•