Closed Bug 1366083 Opened 3 years ago Closed 2 years ago

Intermittent browser_net_statistics-02.js,browser_net_sort-02.js,browser_net_api-calls.js,browser_net_icon-preview.js | application crashed [@ js::GCMarker::processMarkStackTop(js::SliceBudget &)]

Categories

(Core :: JavaScript: GC, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox-esr60 --- wontfix

People

(Reporter: intermittent-bug-filer, Assigned: sfink)

References

(Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure, Whiteboard: [stockwell unknown])

Crash Data

Attachments

(2 files, 2 obsolete files)

Component: Developer Tools: Netmonitor → JavaScript: GC
Product: Firefox → Core
Ryan, is there somebody who could bisect down the source of the revision on Treeherder? Thanks.
Blocks: 719114
Flags: needinfo?(ryanvm)
Maybe one of our intrepid stockwell folks can :)
Flags: needinfo?(ryanvm) → needinfo?(jmaher)
This started on autoland on May 18. It is a very frequent intermittent on Windows Debug e10s only.

https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=windows%20debug%20dt5&tochange=d18aa0ba5171b293f46fc5f095859d960a55c303&fromchange=249246e433dc06b270d0e5e5cfb446b97e879fe4

suggests this may have been triggered by 1365635.

:rickychien - can you have a look?
Flags: needinfo?(jmaher)
Flags: needinfo?(rchien)
Whiteboard: [stockwell needswork]
On the one hand it is surprising that a JS-only change could cause a GC crash. On the other hand, it was a change to netmonitor and the crashes are in netmonitor tests.
See Also: → 1366153
Assignee: nobody → rchien
Status: NEW → ASSIGNED
Flags: needinfo?(rchien)
Honza, I'm not sure why bug 1365635 only caused this weird GC crash on win32 debug. I can't figure it out how that happens. However, most of platforms looks like irrelevant. I'd prefer to disable it for win32 debug.
I agree we should just disable the test on win32 and leave JS team to investigate further.
Keywords: leave-open
(In reply to Ricky Chien [:rickychien] from comment #9)
> Honza, I'm not sure why bug 1365635 only caused this weird GC crash on win32
> debug. I can't figure it out how that happens. However, most of platforms
> looks like irrelevant. I'd prefer to disable it for win32 debug.

A similar crash is also happening on Win64 (bug 1366296) and OSX (bug 1366153).
See Also: → 1366296
See Also: → 1365564
Attachment #8869671 - Attachment is obsolete: true
Attachment #8869671 - Flags: review?(odvarko)
After having a discussion with performance team engineers, we believe this might be a JS engine or memory relevant issue but we hit unluckily. We are going to leave it open and let us deal with it in bug 1366296.
Assignee: rchien → nobody
Status: ASSIGNED → NEW
(In reply to Ricky Chien [:rickychien] from comment #13)
> After having a discussion with performance team engineers, we believe this
> might be a JS engine or memory relevant issue but we hit unluckily. We are
> going to leave it open and let us deal with it in bug 1366296.

It would still probably be a good idea to disable this test on Win32 debug given how frequently this crash is happening, just to improve the orange factor.
after May 24th, this has reduced significantly (12 failures in 2 weeks):
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1366083&startday=2017-05-24&endday=2017-06-07&tree=all
Whiteboard: [stockwell needswork] → [stockwell unknown]
Summary: Intermittent browser_net_statistics-02.js | application crashed [@ js::GCMarker::processMarkStackTop(js::SliceBudget &)] → Intermittent browser_net_statistics-02.js,browser_net_sort-02.js | application crashed [@ js::GCMarker::processMarkStackTop(js::SliceBudget &)]
Summary: Intermittent browser_net_statistics-02.js,browser_net_sort-02.js | application crashed [@ js::GCMarker::processMarkStackTop(js::SliceBudget &)] → Intermittent browser_net_statistics-02.js,browser_net_sort-02.js,browser_net_api-calls.js,browser_net_icon-preview.js | application crashed [@ js::GCMarker::processMarkStackTop(js::SliceBudget &)]
Duplicate of this bug: 1386875
this seems to have picked up a bit lately, not enough to make a stockwell bug though
Bulk priority update of open intermittent test failure bugs. 

P3 => P5

https://bugzilla.mozilla.org/show_bug.cgi?id=1381960
Priority: P3 → P5
Steve, would you mind taking a look at this intermittent failure, just enough to make sure we're not overlooking an obvious clue?

Could be lurking JSAPI misuse somewhere...
Flags: needinfo?(sphink)
We're marking a value array, and one of the values is a nullptr JSObject*. The value array is probably the containing object's slots, but it might also be an Array.

I've begin seeing other instances recently of null object values, including ones in release crashes. So it's pretty great that we have these intermittents.

I'm not sure how these null JSObject Values might be introduced. Let's see...

In C++, these should mostly flow through Value::setObject(JSObject&). Which is a little worrisome, because although there are various asserts that the reference is valid, dereferencing a nullptr is UB and so I could imagine the compiler discarding all the null checks both before and after the setObject call. I haven't looked at the disassembly to check, though.

Other sources of Values I can think of are memcpy and the JIT.

This is a hot path, but this seems like it has enough value to warrant some diagnostics. Dumping out the containing JSObject seems like it could be very revealing -- assuming the dumping doesn't dereference the same null pointer, anyway. And it won't tell us what wrote that Value in, but it might make it easier to track down.
Flags: needinfo?(sphink)
I'm going to upgrade the priority of this bug because I am hopeful that it might uncover a more serious problem that is dogging us right now, and for whatever reason, the work is happening here.
Priority: P5 → P2
Oh, I thought I'd submitted this already, but I was waiting on a try run. Good news is that it doesn't seem to have broken anything. Bad news is that it didn't break anything and find bugs.
Attachment #8928768 - Flags: review?(jcoppeard)
Assignee: nobody → sphink
Status: NEW → ASSIGNED
Comment on attachment 8928768 [details] [diff] [review]
diagnostic patch for ObjectValue(nullptr)

Review of attachment 8928768 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Marking.cpp
@@ +1760,5 @@
> +                fprintf(stderr,
> +                        "processMarkStackTop found ObjectValue(nullptr) "
> +                        "at %zu Values from end of array in object:\n",
> +                        end - (vp - 1));
> +                DumpObject(obj);

Worth a MOZ_CRASH here too?
Attachment #8928768 - Flags: review?(jcoppeard) → review+
(In reply to Jon Coppeard (:jonco) (PTO until 20th Nov) from comment #32)
> Comment on attachment 8928768 [details] [diff] [review]
> diagnostic patch for ObjectValue(nullptr)
> 
> Review of attachment 8928768 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/gc/Marking.cpp
> @@ +1760,5 @@
> > +                fprintf(stderr,
> > +                        "processMarkStackTop found ObjectValue(nullptr) "
> > +                        "at %zu Values from end of array in object:\n",
> > +                        end - (vp - 1));
> > +                DumpObject(obj);
> 
> Worth a MOZ_CRASH here too?

I was leaving it out in hopes that the crashes would continue to be routed to this bug; would that still happen if I changed the crash site via MOZ_CRASH?
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3059e975eef8
diagnostic patch for ObjectValue(nullptr), r=jonco
Backout by archaeopteryx@coole-files.de:
https://hg.mozilla.org/mozilla-central/rev/010374bce606
Backed out 5 changesets (bug 1366083, bug 1410528, bug 1417267) for failing Spidermonkey jstests and jit-tests r=backout on a CLOSED TREE
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/29cc33a514b6
diagnostic patch for ObjectValue(nullptr), r=jonco
While I'm at it, let me put one closer to the source. Just pushing to try with this for now.
It seems it's time to up the bar. The problem that the previous assert was added for doesn't seem to be happening now. But ObjectValue(nullptr) appears to still be causing issues in other bugs. I would like to try landing a more draconian diagnostic assert for a while to see if it catches anything.
Attachment #8953192 - Flags: review?(jcoppeard)
Attachment #8930766 - Attachment is obsolete: true
Flags: needinfo?(sphink)
Comment on attachment 8953192 [details] [diff] [review]
Diagnostic assert for ObjectValue(nullptr)

Review of attachment 8953192 [details] [diff] [review]:
-----------------------------------------------------------------

Let's do it.
Attachment #8953192 - Flags: review?(jcoppeard) → review+
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c28735015441
Diagnostic assert for ObjectValue(nullptr), r=jonco
The leave-open keyword is there and there is no activity for 6 months.
:sfink, maybe it's time to close this bug?
Flags: needinfo?(sphink)
Depends on: 1509985
Yes. I think we should probably back this out now, but I guess that makes sense to do in another bug since it's a totally different version now.
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Flags: needinfo?(sphink)
Resolution: --- → FIXED
Keywords: leave-open
Target Milestone: --- → mozilla60
You need to log in before you can comment on or make changes to this bug.