1262015 - Intermittent browser_wa_reset-01.js | Found a tab after previous test timed out: doc_simple-context.html - | application crashed [@ js::gc::ZoneCellIterImpl::ZoneCellIterImpl(JS::Zone *,js::gc::AllocKind)]

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Reporter

Description

•

8 years ago

https://treeherder.mozilla.org/logviewer.html#?job_id=8420235&repo=fx-team

Comment hidden (Intermittent Failures Robot)

5 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 4
* fx-team: 1

Platform breakdown:
* windows7-32: 5

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1262015&startday=2016-04-04&endday=2016-04-10&tree=all

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Component: Developer Tools: Web Audio Editor → JavaScript: GC

Product: Firefox → Core

Ryan VanderMeulen [:RyanVM]

Comment 2

•

8 years ago

Paul, I pinged Terrence about this on IRC and he said there's probably an underlying tracing/rooting issue in this code. Any chance you can take a look?

Flags: needinfo?(padenot)

Paul Adenot (:padenot)

Comment 3

•

8 years ago

This might just be another shutdown issue. I put a possible fix for all MediaStreamGraph-related issues in bug 1267600.

Flags: needinfo?(padenot)

Comment hidden (Intermittent Failures Robot)

39 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 32
* fx-team: 3
* mozilla-central: 2
* try: 1
* ash: 1

Platform breakdown:
* windows7-32: 36
* windows8-64: 1
* linux64: 1
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1262015&startday=2016-05-30&endday=2016-06-05&tree=all

Comment hidden (Intermittent Failures Robot)

16 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 12
* try: 1
* mozilla-central: 1
* mozilla-aurora: 1
* fx-team: 1

Platform breakdown:
* windows7-32: 16

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1262015&startday=2016-06-08&endday=2016-06-08&tree=all

Comment hidden (Intermittent Failures Robot)

62 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 42
* fx-team: 9
* try: 6
* mozilla-central: 4
* mozilla-aurora: 1

Platform breakdown:
* windows7-32: 62

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1262015&startday=2016-06-06&endday=2016-06-12&tree=all

Comment hidden (Intermittent Failures Robot)

19 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 12
* fx-team: 5
* mozilla-central: 2

Platform breakdown:
* windows7-32: 18
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1262015&startday=2016-06-13&endday=2016-06-19&tree=all

Comment hidden (Intermittent Failures Robot)

37 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 26
* fx-team: 4
* mozilla-aurora: 3
* mozilla-central: 2
* autoland: 1
* ash: 1

Platform breakdown:
* windows7-32-vm: 25
* windows7-32: 10
* windows8-64: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1262015&startday=2016-06-20&endday=2016-06-26&tree=all

Comment hidden (Intermittent Failures Robot)

31 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 13
* fx-team: 6
* autoland: 6
* try: 2
* mozilla-aurora: 2
* mozilla-central: 1
* ash: 1

Platform breakdown:
* windows7-32-vm: 28
* windowsxp: 2
* windows8-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1262015&startday=2016-06-27&endday=2016-07-03&tree=all

Ryan VanderMeulen [:RyanVM]

Comment 10

•

8 years ago

This appears to still be hitting with high frequency. Any chance you could take another look, Paul?

Flags: needinfo?(padenot)

Paul Adenot (:padenot)

Comment 11

•

8 years ago

I can, but I might need some hints to debug this.

Andrew, if this is a tracing/rooting issue, do we have a way to debug this? I suppose I could push some instrumentation on try and retrigger like crazy or something. I'm afraid I know close to nothing about all this.

Flags: needinfo?(padenot) → needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 12

•

8 years ago

I see at least two different assertions here, isNurseryAllocAllowed
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=112939#L20448

and rt->gc.nursery.isEmpty()
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-aurora&job_id=2861093

The first was more common in the logs I looked at. This should be starred according to the assertion message, not that there was a crash.

I'm not sure what these assertions mean, maybe Terrence could help.

I did notice that these assertions seem to be happening shortly after a "WARNING: Audio Buffer is not full by the end of the callback." message, so maybe that's related.

Flags: needinfo?(continuation) → needinfo?(terrence)

Terrence Cole [:terrence]

Comment 13

•

8 years ago

This means that something tried to allocate a generic object while there was an AutoAssertNoNurseryAlloc on the stack. This happens frequently if someone tries to call a script or use spidermonkey api from a callback where such is not allowed. The same is true of the nursery.isEmpty() assertion. The latter can only happen if script usage occurs in a GC callback.

This will be trivial to track down if we can find a clean crash stack. So far all the ones I've checked have been hopelessly corrupted: e.g. arena_dalloc cannot possibly call js::Interpret. I'll keep looking.

Flags: needinfo?(terrence)

Terrence Cole [:terrence]

Comment 14

•

8 years ago

Looking at 10's of stacks from the most recent orangefactor report shows that these are:
  * Only on M-e10s(dt7)
  * Only in debug builds
  * On all versions of windows (although mostly win7)
  * All have a very similar, but essentially broken stack trace

I think this is mostly likely either a miscompilation or some sort of really, really nasty heap corruption. Looking at crashstats for tryNewNurseryGCThing, I see [1]. So this may be an issue we've released. Unfortunately, the stacks on those reports are even more broken.

I'm afraid that this bug is going to require a dmajor level of debugging skill to investigate successfully.

1- https://crash-stats.mozilla.org/signature/?signature=js%3A%3Agc%3A%3AGCRuntime%3A%3AtryNewNurseryObject%3CT%3E&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#reports

Ryan VanderMeulen [:RyanVM]

Comment 15

•

8 years ago

(In reply to Terrence Cole [:terrence] from comment #14)
>   * On all versions of windows (although mostly win7)

As mentioned on IRC, this is probably because WinXP/Win8 only have M-e10s enabled on Ash and on the release branches where volume is obviously much lower. WinXP and Win8 are run on in-house machines still, so that at least makes it seem unlikely to be an issue with AWS machine configs or something.

Ryan VanderMeulen [:RyanVM]

Comment 16

•

8 years ago

Sounds like the kind of situation the Uptime team might be interested in too.

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Comment 17

•

8 years ago

Terrence, FWIW, bug 1240231 was hitting on OSX too, so I'm not sure this is a compiler issue unless it's something that manages to affect multiple different ones. But I'm also wondering if it's worth throwing rr-chaos at it at this point to see if we can hit it on Linux too under the right circumstances.

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Blocks: e10s-tests

status-firefox47: --- → wontfix

status-firefox48: --- → affected

status-firefox49: --- → affected

status-firefox50: --- → affected

tracking-e10s: --- → ?

Ryan VanderMeulen [:RyanVM]

Comment 18

•

8 years ago

I tracked bug 1237795 down to bug 1132501. Hopefully that helps shed some light on this.

Jim Mathies [:jimm]

Updated

•

8 years ago

tracking-e10s: ? → +

Priority: -- → P3

Comment hidden (Intermittent Failures Robot)

34 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 16
* mozilla-aurora: 6
* fx-team: 5
* autoland: 5
* mozilla-central: 1
* ash: 1

Platform breakdown:
* windows7-32-vm: 30
* windows8-64: 4

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1262015&startday=2016-07-04&endday=2016-07-10&tree=all

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Blocks: 1285232

Comment hidden (Intermittent Failures Robot)

22 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 8
* mozilla-central: 5
* fx-team: 5
* autoland: 2
* mozilla-aurora: 1
* ash: 1

Platform breakdown:
* windows7-32-vm: 21
* windows8-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1262015&startday=2016-07-11&endday=2016-07-17&tree=all

Comment hidden (Intermittent Failures Robot)

18 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* autoland: 7
* mozilla-inbound: 4
* mozilla-aurora: 4
* fx-team: 3

Platform breakdown:
* windows7-32-vm: 16
* windows8-64: 1
* android-4-3-armv7-api15: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1262015&startday=2016-07-18&endday=2016-07-24&tree=all

Ryan VanderMeulen [:RyanVM]

Comment 22

•

8 years ago

Something made this stop on trunk around July 20. I wonder what!

status-firefox48: affected → wontfix

Flags: needinfo?(terrence)

Terrence Cole [:terrence]

Comment 23

•

8 years ago

Nothing stands out. It probably wouldn't though if it was a heap corruption or undefined behavior.

Flags: needinfo?(terrence)

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → INCOMPLETE